NeuralMarcoro14-7B: DPO Fine-tuned for Enhanced Performance
NeuralMarcoro14-7B is a 7 billion parameter language model developed by mlabonne, derived from a DPO (Direct Preference Optimization) fine-tuning of the existing mlabonne/Marcoro14-7B-slerp model. This fine-tuning process utilized the chatml_dpo_pairs preference dataset, leading to notable improvements in its overall capabilities.
Key Capabilities & Performance
- Improved Benchmarks: The model shows enhanced performance across critical benchmarks, specifically the Nous benchmark suite and the Open LLM Leaderboard.
- Leaderboard Recognition: As of January 8, 2024, NeuralMarcoro14-7B was recognized as the best-performing 7B LLM on the Open LLM Leaderboard, indicating strong general-purpose reasoning and instruction-following abilities.
- Nous Benchmark Gains: While maintaining similar scores in AGIEval and GPT4ALL, it achieved significant improvements in TruthfulQA (+1.79) and Bigbench (+1.26) compared to its base model, Marcoro14-7B-slerp, resulting in an overall average increase of +0.73.
- Context Length: The model supports a context length of 8192 tokens, suitable for handling moderately long inputs and generating coherent responses.
Training Details
The DPO fine-tuning involved specific hyperparameters for LoRA (r=16, lora_alpha=16, lora_dropout=0.05) and training arguments (learning_rate=5e-5, max_steps=200, optim="paged_adamw_32bit"). The DPOTrainer configuration included a beta of 0.1 and max_prompt_length of 1024, with max_length set to 1536.
Use Cases
This model is well-suited for applications requiring a highly capable 7B instruction-tuned model, particularly where strong performance on general knowledge, reasoning, and truthful question-answering is critical. Its optimized performance makes it a strong candidate for chat applications, content generation, and various NLP tasks.