Amu/dpo-phi2
Amu/dpo-phi2 is a 3 billion parameter instruction-tuned model derived from Microsoft's Phi-2, fine-tuned using Direct Preference Optimization (DPO) on the argilla/distilabel-intel-orca-dpo-pairs dataset. This model is primarily designed for general language understanding and generation tasks, demonstrating an average performance of 61.26 on the Open LLM Leaderboard. Its key characteristic is the application of DPO for alignment, making it suitable for research into preference-based fine-tuning, though it has a limited context length of 2048 tokens.
Loading preview...
Amu/dpo-phi2: Instruction-Tuned Phi-2 with DPO
Amu/dpo-phi2 is a 3 billion parameter language model built upon Microsoft's Phi-2, enhanced through Direct Preference Optimization (DPO). It was fine-tuned using the argilla/distilabel-intel-orca-dpo-pairs dataset, aiming to align its responses with human preferences. This model is primarily intended for research into DPO techniques and general language tasks.
Key Capabilities
- Instruction-Tuning: Benefits from DPO for improved instruction following compared to its base model.
- General Language Understanding: Capable of various text generation and comprehension tasks.
- Compact Size: At 3 billion parameters, it offers a balance between performance and computational efficiency.
Good for
- Research on DPO: Ideal for exploring the effects and applications of Direct Preference Optimization.
- Prototyping: Suitable for developing and testing applications where a smaller, instruction-tuned model is beneficial.
- Educational Purposes: Can be used to understand the principles of instruction tuning and preference alignment.
Limitations
It's important to note that dpo-phi2 has several limitations, including potential for generating inaccurate code and facts, limited scope for code generation (primarily Python with common packages), and struggles with complex or nuanced instructions. It is primarily designed for standard English and may exhibit societal biases or produce toxic content if explicitly prompted. The model also tends to be verbose due to its textbook-like training data.