Amu/spin-phi2: Enhanced Conversational Model
Amu/spin-phi2 is a 3 billion parameter language model derived from Microsoft's Phi-2, specifically fine-tuned using the Self-Play finetuning (SPIN) method. This approach applies SPIN to a pretrained model, rather than a Supervised Fine-Tuning (SFT) model, aiming to improve performance beyond the original base model.
Key Capabilities & Differentiators
- SPIN Fine-tuning: Utilizes the innovative SPIN method on a pretrained model, a departure from its typical application on SFT models, to enhance conversational abilities.
- Performance Improvement: Achieves a higher score on the Open LLM Leaderboard compared to the original pretrained Phi-2, indicating improved general language understanding and reasoning.
- Conversational Focus: Fine-tuned on the
ultrachat_200k dataset, which is designed for aligning SFT models, suggesting a strong orientation towards conversational AI. - Evaluation Metrics: Achieves an average score of 61.68 on the Open LLM Leaderboard, with notable scores in reasoning (AI2 Reasoning Challenge: 63.57) and common sense (HellaSwag: 75.57, Winogrande: 73.48).
Training Paradigm
The developer proposes an optimal training paradigm for conversational LLMs: pretrain -> dpo(spin) -> sft -> dpo(spin), highlighting the iterative application of DPO/SPIN for alignment.
Use Cases
This model is well-suited for applications requiring a compact yet capable conversational AI, general text generation, and tasks benefiting from improved reasoning and common sense understanding, especially where the Phi-2 architecture is a good fit.