Name: jackf857/llama-3-8b-base-cpo-ultrafeedback-4xH200-batch-128-rerun API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-cpo-ultrafeedback-4xH200-batch-128-rerun, is an 8 billion parameter language model based on the Llama 3 architecture. It has been fine-tuned using the Constitutional Preference Optimization (CPO) method, specifically on the HuggingFaceH4/ultrafeedback_binarized dataset. This training approach aims to align the model's outputs more closely with human preferences.

Key Characteristics

Base Model: Fine-tuned from W-61/llama-3-8b-base-sft-ultrachat-8xh200.
Training Method: Utilizes CPO for preference alignment.
Evaluation Metrics: Achieved a rewards accuracy of 0.5160 and a rewards margin of -0.0586 on its evaluation set, indicating its performance in distinguishing between preferred and rejected responses.
Context Length: Supports an 8192 token context window.

Intended Use Cases

This model is particularly well-suited for applications where generating responses that are aligned with human preferences is crucial. Its CPO fine-tuning makes it a strong candidate for:

Dialogue systems: Generating more helpful and harmless conversational turns.
Content generation: Producing text that adheres to specific quality or style guidelines based on preference data.
Assistant models: Enhancing the quality and relevance of AI assistant responses.

Overview

Model Overview

Key Characteristics

Intended Use Cases

Full Model Card (README)