Magpie-Align/Llama-3.1-8B-Magpie-Align-SFT-v0.1: Self-Synthesized Alignment
This model is an 8 billion parameter instruction-tuned variant of Meta's Llama-3.1-8B, developed by Magpie-Align. Its core innovation lies in the Magpie self-synthesis method, which generates high-quality instruction data by prompting aligned LLMs like Llama-3-Instruct. This approach addresses the challenge of scaling high-quality alignment data creation, which is often limited by human labor and predefined prompting scopes.
Key Capabilities & Differentiators
- Self-Synthesized Data: Utilizes 300K high-quality instances selected from 4 million instructions and responses generated by the Magpie method.
- Comparable to Official Instruct Models: Achieves performance on par with the official Llama-3.1-8B-Instruct model using only Supervised Fine-Tuning (SFT), without requiring additional preference optimization techniques.
- Strong Alignment Performance: Demonstrates competitive results on alignment benchmarks:
- Alpaca Eval 2: 24.79 (LC), 25.05 (WR)
- Arena Hard: 21.0
- Efficient Alignment: Surpasses previous public datasets used for both SFT and preference optimization (e.g., DPO with UltraFeedback) by leveraging its unique data synthesis.
Training Details
The model was fine-tuned on Magpie-Align/Magpie-Pro-MT-300K-v0.1 and Magpie-Align/Magpie-Reasoning-150K datasets. It was trained for 2 epochs with a learning rate of 2e-05 and a total batch size of 128, utilizing a cosine learning rate scheduler.
Use Cases
This model is ideal for applications requiring a highly aligned and performant 8B language model, particularly where access to large-scale, high-quality instruction data is a bottleneck. Its strong performance on alignment benchmarks suggests suitability for conversational AI, instruction following, and general-purpose assistant tasks.