Name: jackf857/llama-3-8b-base-orpo-ultrafeedback-8xh200 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: jackf857

Model Overview

This model, jackf857/llama-3-8b-base-orpo-ultrafeedback-8xh200, is an 8 billion parameter language model based on the Llama 3 architecture. It has been fine-tuned using the Odds Ratio Preference Optimization (ORPO) method, building upon the W-61/llama-3-8b-base-sft-ultrachat-8xh200 model. The fine-tuning process utilized the HuggingFaceH4/ultrafeedback_binarized dataset, aiming to align the model's outputs more closely with human preferences.

Key Characteristics

Base Model: Llama 3 8B parameters.
Fine-tuning Method: ORPO (Odds Ratio Preference Optimization).
Dataset: Fine-tuned on HuggingFaceH4/ultrafeedback_binarized for preference alignment.
Performance Metrics: Achieved a rewards accuracy of 0.6048 on the evaluation set, indicating its ability to differentiate between preferred and non-preferred responses.

Training Details

The model was trained with a learning rate of 5e-07, a total batch size of 128, and for 1 epoch. The training involved 8 GPUs with a gradient accumulation of 4 steps. The optimizer used was AdamW with cosine learning rate scheduling.

Potential Use Cases

This model is particularly well-suited for applications where generating high-quality, human-preferred text is crucial. Its ORPO fine-tuning makes it a strong candidate for:

Chatbots and Conversational AI: Producing more natural and helpful responses.
Content Generation: Creating text that aligns with specific quality or style preferences.
Instruction Following: Generating outputs that better adhere to given instructions and user preferences.

Overview

Model Overview

Key Characteristics

Training Details

Potential Use Cases

Full Model Card (README)