Amu/spin-phi2: Enhanced Conversational Model

Amu/spin-phi2 is a 3 billion parameter language model derived from Microsoft's Phi-2, specifically fine-tuned using the Self-Play finetuning (SPIN) method. This approach applies SPIN to a pretrained model, rather than a Supervised Fine-Tuning (SFT) model, aiming to improve performance beyond the original base model.

Key Capabilities & Differentiators

SPIN Fine-tuning: Utilizes the innovative SPIN method on a pretrained model, a departure from its typical application on SFT models, to enhance conversational abilities.
Performance Improvement: Achieves a higher score on the Open LLM Leaderboard compared to the original pretrained Phi-2, indicating improved general language understanding and reasoning.
Conversational Focus: Fine-tuned on the ultrachat_200k dataset, which is designed for aligning SFT models, suggesting a strong orientation towards conversational AI.
Evaluation Metrics: Achieves an average score of 61.68 on the Open LLM Leaderboard, with notable scores in reasoning (AI2 Reasoning Challenge: 63.57) and common sense (HellaSwag: 75.57, Winogrande: 73.48).

Training Paradigm

The developer proposes an optimal training paradigm for conversational LLMs: pretrain -> dpo(spin) -> sft -> dpo(spin), highlighting the iterative application of DPO/SPIN for alignment.

Use Cases

This model is well-suited for applications requiring a compact yet capable conversational AI, general text generation, and tasks benefiting from improved reasoning and common sense understanding, especially where the Phi-2 architecture is a good fit.

Overview

Amu/spin-phi2: Enhanced Conversational Model

Key Capabilities & Differentiators

Training Paradigm

Use Cases

Full Model Card (README)