mlabonne/NeuralMarcoro14-7B

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:8kPublished:Jan 6, 2024License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Cold

NeuralMarcoro14-7B by mlabonne is a 7 billion parameter language model, DPO fine-tuned from Marcoro14-7B-slerp, with an 8192-token context length. It significantly improves performance on the Nous benchmark suite and the Open LLM Leaderboard, where it was the top-performing 7B LLM as of January 2024. This model is optimized for general-purpose chat and instruction-following tasks, demonstrating enhanced reasoning and factual recall.

Loading preview...

NeuralMarcoro14-7B: DPO Fine-tuned for Enhanced Performance

NeuralMarcoro14-7B is a 7 billion parameter language model developed by mlabonne, derived from a DPO (Direct Preference Optimization) fine-tuning of the existing mlabonne/Marcoro14-7B-slerp model. This fine-tuning process utilized the chatml_dpo_pairs preference dataset, leading to notable improvements in its overall capabilities.

Key Capabilities & Performance

  • Improved Benchmarks: The model shows enhanced performance across critical benchmarks, specifically the Nous benchmark suite and the Open LLM Leaderboard.
  • Leaderboard Recognition: As of January 8, 2024, NeuralMarcoro14-7B was recognized as the best-performing 7B LLM on the Open LLM Leaderboard, indicating strong general-purpose reasoning and instruction-following abilities.
  • Nous Benchmark Gains: While maintaining similar scores in AGIEval and GPT4ALL, it achieved significant improvements in TruthfulQA (+1.79) and Bigbench (+1.26) compared to its base model, Marcoro14-7B-slerp, resulting in an overall average increase of +0.73.
  • Context Length: The model supports a context length of 8192 tokens, suitable for handling moderately long inputs and generating coherent responses.

Training Details

The DPO fine-tuning involved specific hyperparameters for LoRA (r=16, lora_alpha=16, lora_dropout=0.05) and training arguments (learning_rate=5e-5, max_steps=200, optim="paged_adamw_32bit"). The DPOTrainer configuration included a beta of 0.1 and max_prompt_length of 1024, with max_length set to 1536.

Use Cases

This model is well-suited for applications requiring a highly capable 7B instruction-tuned model, particularly where strong performance on general knowledge, reasoning, and truthful question-answering is critical. Its optimized performance makes it a strong candidate for chat applications, content generation, and various NLP tasks.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p