Name: argilla/distilabeled-Marcoro14-7B-slerp-full API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: argilla

Overview

argilla/distilabeled-Marcoro14-7B-slerp-full is a 7 billion parameter language model developed by Argilla. It is a DPO (Direct Preference Optimization) fine-tune of the mlabonne/Marcoro14-7B-slerp model. A key differentiator for this model is its training on the entire argilla/distilabel-intel-orca-dpo-pairs dataset for a full epoch, unlike its predecessor which was trained for only 200 steps.

Training Details

The model was fine-tuned using a reproducible recipe, updating the base model to mlabonne/Marcoro14-7B-slerp and applying a specific filtering process to the Intel/orca_dpo_pairs dataset to create argilla/distilabel-intel-orca-dpo-pairs. This filtering removed 'tie' status entries, required a chosen_score of 8 or higher, and excluded entries from the gsm8k_train set. Training was conducted on 1 x A100 80GB GPU for less than 2 hours.

Benchmark Performance

Benchmarked using the "Nous" / "Teknium" benchmark and LLM AutoEval, the model shows competitive results:

AGIEval: 45.17
GPT4ALL: 76.59 (highest among compared models)
TruthfulQA: 64.68
Bigbench: 48.15 (highest among compared models)
Average: 58.65 (highest among compared models)

On the Open LLM Leaderboard, it achieved an average score of 73.40, with notable scores in HellaSwag (87.55) and Winogrande (82.00).

Key Differentiators

Full Epoch DPO Fine-tuning: Unlike a previous version, this model saw the entire filtered dataset during DPO training, potentially leading to more robust performance.
Curated Dataset: Utilizes a carefully filtered version of the Intel Orca DPO pairs dataset, focusing on high-quality preference data.
Strong General Performance: Achieves leading scores in several categories on the Nous benchmark compared to its base model and other distilabeled variants.

Overview

Overview

Training Details

Benchmark Performance

Key Differentiators

Full Model Card (README)