argilla/distilabeled-Marcoro14-7B-slerp-full

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 14, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

argilla/distilabeled-Marcoro14-7B-slerp-full is a 7 billion parameter DPO fine-tuned language model developed by Argilla, based on the mlabonne/Marcoro14-7B-slerp architecture. This model was fine-tuned for a full epoch on the argilla/distilabel-intel-orca-dpo-pairs dataset, which is a filtered version of Intel/orca_dpo_pairs. It demonstrates strong performance across various benchmarks, including AGIEval, GPT4ALL, TruthfulQA, and Bigbench, making it suitable for general conversational AI and reasoning tasks.

Loading preview...

Overview

argilla/distilabeled-Marcoro14-7B-slerp-full is a 7 billion parameter language model developed by Argilla. It is a DPO (Direct Preference Optimization) fine-tune of the mlabonne/Marcoro14-7B-slerp model. A key differentiator for this model is its training on the entire argilla/distilabel-intel-orca-dpo-pairs dataset for a full epoch, unlike its predecessor which was trained for only 200 steps.

Training Details

The model was fine-tuned using a reproducible recipe, updating the base model to mlabonne/Marcoro14-7B-slerp and applying a specific filtering process to the Intel/orca_dpo_pairs dataset to create argilla/distilabel-intel-orca-dpo-pairs. This filtering removed 'tie' status entries, required a chosen_score of 8 or higher, and excluded entries from the gsm8k_train set. Training was conducted on 1 x A100 80GB GPU for less than 2 hours.

Benchmark Performance

Benchmarked using the "Nous" / "Teknium" benchmark and LLM AutoEval, the model shows competitive results:

  • AGIEval: 45.17
  • GPT4ALL: 76.59 (highest among compared models)
  • TruthfulQA: 64.68
  • Bigbench: 48.15 (highest among compared models)
  • Average: 58.65 (highest among compared models)

On the Open LLM Leaderboard, it achieved an average score of 73.40, with notable scores in HellaSwag (87.55) and Winogrande (82.00).

Key Differentiators

  • Full Epoch DPO Fine-tuning: Unlike a previous version, this model saw the entire filtered dataset during DPO training, potentially leading to more robust performance.
  • Curated Dataset: Utilizes a carefully filtered version of the Intel Orca DPO pairs dataset, focusing on high-quality preference data.
  • Strong General Performance: Achieves leading scores in several categories on the Nous benchmark compared to its base model and other distilabeled variants.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p