mlabonne/ChimeraLlama-3-8B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:8kLicense:otherArchitecture:Transformer0.0K Warm

The mlabonne/ChimeraLlama-3-8B is an 8 billion parameter Llama 3-based instruction-tuned language model, created by mlabonne through a merge of several Llama 3 variants including NousResearch/Meta-Llama-3-8B-Instruct and mlabonne/OrpoLlama-3-8B. This model is specifically designed to outperform the base Llama 3 8B Instruct on Nous' benchmark suite, making it suitable for general-purpose conversational AI and reasoning tasks. It leverages a dare_ties merge method to combine the strengths of its constituent models, offering enhanced performance in areas like AGIEval, GPT4All, and TruthfulQA.

Loading preview...

ChimeraLlama-3-8B: A Merged Llama 3 Instruction Model

ChimeraLlama-3-8B is an 8 billion parameter instruction-tuned language model developed by mlabonne. It is a composite model, created by merging four distinct Llama 3-based models using the dare_ties method via LazyMergekit. This approach combines the strengths of:

  • NousResearch/Meta-Llama-3-8B-Instruct
  • mlabonne/OrpoLlama-3-8B
  • Locutusque/Llama-3-Orca-1.0-8B
  • abacusai/Llama-3-Smaug-8B

Key Capabilities & Performance

The primary objective of ChimeraLlama-3-8B is to enhance performance beyond the base Llama 3 8B Instruct model. Evaluations conducted using Nous' benchmark suite demonstrate its effectiveness:

  • Outperforms Llama 3 8B Instruct: Achieves an average score of 51.58 on the Nous benchmark, surpassing Meta-Llama-3-8B-Instruct's 51.34.
  • Strong in Reasoning and Knowledge: Shows competitive scores across various sub-benchmarks, including AGIEval (39.12), GPT4All (71.81), TruthfulQA (52.4), and Bigbench (42.98).

Good for

  • General-purpose conversational AI: Its instruction-tuned nature makes it suitable for a wide range of dialogue-based applications.
  • Reasoning and knowledge-intensive tasks: The model's performance on benchmarks like AGIEval and TruthfulQA suggests proficiency in these areas.
  • Developers seeking an optimized Llama 3 variant: Offers improved performance over the base Llama 3 8B Instruct without increasing parameter count, making it a compelling alternative for similar use cases.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p