Amu/dpo-phi2: Instruction-Tuned Phi-2 with DPO

Amu/dpo-phi2 is a 3 billion parameter language model built upon Microsoft's Phi-2, enhanced through Direct Preference Optimization (DPO). It was fine-tuned using the argilla/distilabel-intel-orca-dpo-pairs dataset, aiming to align its responses with human preferences. This model is primarily intended for research into DPO techniques and general language tasks.

Key Capabilities

Instruction-Tuning: Benefits from DPO for improved instruction following compared to its base model.
General Language Understanding: Capable of various text generation and comprehension tasks.
Compact Size: At 3 billion parameters, it offers a balance between performance and computational efficiency.

Good for

Research on DPO: Ideal for exploring the effects and applications of Direct Preference Optimization.
Prototyping: Suitable for developing and testing applications where a smaller, instruction-tuned model is beneficial.
Educational Purposes: Can be used to understand the principles of instruction tuning and preference alignment.

Limitations

It's important to note that dpo-phi2 has several limitations, including potential for generating inaccurate code and facts, limited scope for code generation (primarily Python with common packages), and struggles with complex or nuanced instructions. It is primarily designed for standard English and may exhibit societal biases or produce toxic content if explicitly prompted. The model also tends to be verbose due to its textbook-like training data.

Overview

Amu/dpo-phi2: Instruction-Tuned Phi-2 with DPO

Key Capabilities

Good for

Limitations

Full Model Card (README)