haoranxu/Llama-3-Instruct-8B-SimPO
The haoranxu/Llama-3-Instruct-8B-SimPO model is an 8 billion parameter instruction-tuned language model, fine-tuned from Meta-Llama-3-8B-Instruct. It leverages the SimPO training method on the princeton-nlp/llama3-ultrafeedback dataset, enhancing its ability to follow instructions and generate high-quality responses. This model is designed for general-purpose conversational AI and instruction-following tasks, offering an 8192 token context window.
Loading preview...
Model Overview
haoranxu/Llama-3-Instruct-8B-SimPO is an 8 billion parameter instruction-tuned language model, building upon the robust Meta-Llama-3-8B-Instruct architecture. This model has been specifically fine-tuned using the SimPO (Simple Preference Optimization) method, leveraging the comprehensive princeton-nlp/llama3-ultrafeedback dataset. The fine-tuning process aims to enhance the model's instruction-following capabilities and improve the quality of its generated responses.
Key Training Details
- Base Model: Meta-Llama-3-8B-Instruct
- Fine-tuning Dataset:
princeton-nlp/llama3-ultrafeedback - Training Method: SimPO (Simple Preference Optimization)
- Learning Rate: 1e-06
- Batch Size: 2 (train), 4 (eval) with 8 gradient accumulation steps, resulting in a total train batch size of 256.
- Epochs: 1
- Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
- LR Scheduler: Cosine with 0.1 warmup ratio
Intended Use Cases
This model is well-suited for a variety of general-purpose conversational AI applications and tasks requiring precise instruction following. Its fine-tuning on a preference dataset suggests improved alignment with human preferences, making it potentially more effective in generating helpful and harmless outputs.