Overview
The argilla/distilabeled-Marcoro14-7B-slerp is a 7 billion parameter language model developed by Argilla. It is a Direct Preference Optimization (DPO) fine-tune of the mlabonne/Marcoro14-7B-slerp base model. The fine-tuning process utilized a custom-filtered version of the argilla/distilabel-intel-orca-dpo-pairs dataset, which is derived from the original Intel Orca DPO dataset but with enhanced data quality filtering (removing ties, selecting pairs with chosen_score >= 8, and excluding GSM8k training data).
Key Capabilities & Performance
This model shows notable improvements over its base model, Marcoro14-7B-slerp, across several benchmarks. For instance, on the "Nous" or "Teknium" benchmark, it achieved:
- AGIEval: 45.4 (vs. 44.66 for base)
- GPT4ALL: 76.47 (vs. 76.24 for base)
- TruthfulQA: 65.46 (vs. 64.15 for base)
- Bigbench: 47.19 (vs. 45.64 for base)
- Average: 58.63 (vs. 57.67 for base)
Additionally, Open LLM Leaderboard evaluations show an average score of 73.63, with specific scores like MMLU at 65.22 and GSM8k at 71.19. The training was conducted efficiently on a single A100 80GB GPU for less than an hour.
When to Use This Model
This model is suitable for applications requiring a 7B parameter model with enhanced reasoning and truthfulness capabilities, particularly where the quality of instruction-following and response generation is critical. Its DPO fine-tuning on a high-quality dataset makes it a strong candidate for general conversational agents and tasks benefiting from improved factual accuracy and alignment.