Name: jackf857/llama-3-8b-base-cpo-ultrafeedback-8xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-cpo-ultrafeedback-8xh200, is an 8 billion parameter language model derived from W-61/llama-3-8b-base-sft-ultrachat-8xh200. It has been further fine-tuned using a Constitutional Preference Optimization (CPO) approach on the HuggingFaceH4/ultrafeedback_binarized dataset.

Key Characteristics

Preference Alignment: Optimized through CPO to align with human preferences, aiming for more desirable and contextually appropriate responses.
Performance Metrics: Achieved a rewards accuracy of 0.625 on the evaluation set, with a chosen reward score of -36.8871 and a rejected reward score of -38.7328, indicating its ability to differentiate between preferred and non-preferred outputs.
Training Details: Trained for 1 epoch with a learning rate of 5e-07 and a total batch size of 128, utilizing 8 GPUs.

Potential Use Cases

This model is particularly well-suited for applications where generating text that adheres to specific preferences or quality criteria is crucial. Its CPO fine-tuning suggests strengths in:

Dialogue Systems: Generating more natural and preferred conversational responses.
Content Generation: Producing outputs that are better aligned with user expectations or ethical guidelines.
Instruction Following: Improving the quality and relevance of responses to complex instructions.

Overview

Model Overview

Key Characteristics

Potential Use Cases

Full Model Card (README)