Overview
Deita 7B V1.0 is a 7 billion parameter language model developed by hkust-nlp, built upon the Mistral-7B-v0.1 architecture. Its core innovation lies in its training methodology, which emphasizes Automatic Data Selection for instruction tuning. The model was fine-tuned using a combination of 6,000 automatically selected, high-quality alignment Supervised Fine-Tuning (SFT) data from the Deita 6K V0 dataset and 10,000 randomly sampled alignment preference data points from Ultrafeedback, followed by Direct Preference Optimization (DPO).
Key Capabilities & Differentiators
- Efficient Alignment: Achieves strong performance with a relatively small, automatically selected SFT dataset (6K) combined with DPO.
- Performance: Demonstrates competitive results on benchmarks, scoring 7.55 on MT-Bench and 90.06% on AlpacaEval, outperforming many models trained on larger datasets in its class.
- Training Methodology: Utilizes a novel approach to data selection for instruction tuning, as detailed in their research paper.
- Base Model: Fine-tuned from Mistral-7B-v0.1, inheriting its strong foundational capabilities.
Use Cases
This model is particularly well-suited for applications requiring:
- Instruction Following: Generating helpful, detailed, and polite responses based on user prompts.
- Efficient Fine-tuning: Developers looking for a model aligned with high-quality, automatically selected data.
- Research in Data Selection: As a strong baseline or comparative model for studies on instruction tuning data efficiency.