FuseAI/FuseChat-Llama-3.1-8B-Instruct

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

FuseAI's FuseChat-Llama-3.1-8B-Instruct is an 8 billion parameter instruction-tuned causal language model with a 32768 token context length. Developed through implicit model fusion, it integrates capabilities from larger source LLMs like Llama-3.1-70B-Instruct into a more compact target model. This model excels in general conversation, instruction following, mathematics, and coding, demonstrating significant improvements on benchmarks like AlpacaEval-2 and Arena-Hard.

Loading preview...

FuseChat-Llama-3.1-8B-Instruct: Implicit Model Fusion

FuseChat-Llama-3.1-8B-Instruct is an 8 billion parameter model developed by FuseAI, leveraging an innovative implicit model fusion (IMF) technique. This approach enhances the performance of smaller target LLMs by transferring capabilities from multiple robust source LLMs, including Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct.

Key Capabilities & Training

The model's development involved a two-stage training pipeline:

  • Supervised Fine-Tuning (SFT): Utilized best responses from source models to mitigate distribution discrepancies and enhance base capabilities.
  • Direct Preference Optimization (DPO): Optimized using preference pairs (best and worst responses) from source models, with a length-normalized DPO for Llama-3.1 series models.

The training dataset was meticulously constructed from diverse open-source community datasets, focusing on instruction following, general conversation, mathematics, coding, and Chinese-language capabilities. Responses were sampled from the four powerful source LLMs and annotated using an external reward model (ArmoRM) for preference pair creation.

Performance Highlights

FuseChat-Llama-3.1-8B-Instruct demonstrated substantial performance gains across 14 benchmarks, achieving an average improvement of 6.8 points. Notably, it showed significant improvements of 37.1 points on AlpacaEval-2 and 30.1 points on Arena-Hard for instruction-following tasks. It also outperformed AllenAI's Llama-3.1-Tulu-3-8B on most benchmarks, highlighting the effectiveness of the IMF approach in creating a highly capable compact model.