Overview
This model, MaziyarPanahi/Llama-3-8B-Instruct-DPO-v0.3, is an 8 billion parameter language model derived from meta-llama/Meta-Llama-3-8B-Instruct. It has undergone Direct Preference Optimization (DPO) fine-tuning, which typically enhances the model's ability to align with human preferences and generate more helpful and harmless responses.
Key Capabilities & Features
- Extended Context Length: A notable feature is its extended context window of 32,000 tokens, achieved by modifying
rope_theta. This allows the model to handle significantly longer prompts and generate more extensive outputs compared to its base model. - Instruction Following: Fine-tuned with DPO, it is optimized for instruction-following tasks, aiming to produce high-quality, aligned responses.
- ChatML Prompt Template: The model utilizes the
ChatML prompt template, making it compatible with common chat interfaces and ensuring structured input for optimal performance. - Quantized GGUF Versions: Available in quantized GGUF formats, all supporting the 32,000 token context length, facilitating efficient deployment on various hardware.
Performance Highlights
Evaluated on the Open LLM Leaderboard, the model demonstrates competitive performance for its size:
- Average Score: 68.23
- MMLU (5-Shot): 68.33
- GSM8k (5-Shot): 70.58
Good For
- Applications requiring long context understanding and generation.
- General-purpose instruction-following and conversational AI where DPO-tuned responses are beneficial.
- Developers looking for an 8B parameter model with enhanced context handling capabilities.