argilla/distilabeled-Marcoro14-7B-slerp-full
argilla/distilabeled-Marcoro14-7B-slerp-full is a 7 billion parameter DPO fine-tuned language model developed by Argilla, based on the mlabonne/Marcoro14-7B-slerp architecture. This model was fine-tuned for a full epoch on the argilla/distilabel-intel-orca-dpo-pairs dataset, which is a filtered version of Intel/orca_dpo_pairs. It demonstrates strong performance across various benchmarks, including AGIEval, GPT4ALL, TruthfulQA, and Bigbench, making it suitable for general conversational AI and reasoning tasks.
Loading preview...
Overview
argilla/distilabeled-Marcoro14-7B-slerp-full is a 7 billion parameter language model developed by Argilla. It is a DPO (Direct Preference Optimization) fine-tune of the mlabonne/Marcoro14-7B-slerp model. A key differentiator for this model is its training on the entire argilla/distilabel-intel-orca-dpo-pairs dataset for a full epoch, unlike its predecessor which was trained for only 200 steps.
Training Details
The model was fine-tuned using a reproducible recipe, updating the base model to mlabonne/Marcoro14-7B-slerp and applying a specific filtering process to the Intel/orca_dpo_pairs dataset to create argilla/distilabel-intel-orca-dpo-pairs. This filtering removed 'tie' status entries, required a chosen_score of 8 or higher, and excluded entries from the gsm8k_train set. Training was conducted on 1 x A100 80GB GPU for less than 2 hours.
Benchmark Performance
Benchmarked using the "Nous" / "Teknium" benchmark and LLM AutoEval, the model shows competitive results:
- AGIEval: 45.17
- GPT4ALL: 76.59 (highest among compared models)
- TruthfulQA: 64.68
- Bigbench: 48.15 (highest among compared models)
- Average: 58.65 (highest among compared models)
On the Open LLM Leaderboard, it achieved an average score of 73.40, with notable scores in HellaSwag (87.55) and Winogrande (82.00).
Key Differentiators
- Full Epoch DPO Fine-tuning: Unlike a previous version, this model saw the entire filtered dataset during DPO training, potentially leading to more robust performance.
- Curated Dataset: Utilizes a carefully filtered version of the Intel Orca DPO pairs dataset, focusing on high-quality preference data.
- Strong General Performance: Achieves leading scores in several categories on the Nous benchmark compared to its base model and other
distilabeledvariants.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.