Overview
mlabonne/OrpoLlama-3-8B is an 8 billion parameter language model, fine-tuned by mlabonne from the base meta-llama/Meta-Llama-3-8B model. The fine-tuning process utilized the ORPO (Optimized Reward Policy Optimization) method on the mlabonne/orpo-dpo-mix-40k dataset, as detailed in this article.
Key Capabilities and Features
- Architecture: Based on the Llama 3 family, providing a strong foundation for general language understanding and generation.
- Context Window: Supports an 8k token context window, allowing for processing longer inputs and generating more coherent, extended responses.
- ChatML Template: Trained to follow the ChatML template, ensuring compatibility and optimal performance with chat-based applications.
- Quantized Versions: Various quantized versions are available for efficient deployment, including GGUF, AWQ, and EXL2 formats, provided by community contributors like bartowski, solidrust, and LoneStriker.
Performance Highlights
Evaluations using LLM AutoEval indicate that OrpoLlama-3-8B outperforms its base model, Llama-3-8B-Instruct, on specific datasets:
- GPT4All: Achieves 70.59%, surpassing Llama-3-8B-Instruct's 69.86%.
- TruthfulQA: Scores 52.39%, compared to Llama-3-8B-Instruct's 51.65%.
While its overall average score on the Nous benchmark is slightly lower than Llama-3-8B-Instruct, its strengths in specific areas make it a compelling choice for tasks requiring factual accuracy and general knowledge. Training curves and detailed experiment logs are available on W&B.