FuseAI/FuseChat-Llama-3.2-1B-Instruct

Warm
Public
1B
BF16
32768
1
Nov 26, 2024
Hugging Face
Overview

FuseChat-Llama-3.2-1B-Instruct: Implicit Model Fusion

FuseChat-Llama-3.2-1B-Instruct is a 1 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. This series introduces Implicit Model Fusion (IMF), a novel approach to enhance smaller target LLMs by transferring capabilities from robust open-source LLMs like Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct.

Key Capabilities & Training

The IMF process involves a two-stage training pipeline:

  • Supervised Fine-Tuning (SFT): Mitigates distribution discrepancies between target and source LLMs using high-quality responses.
  • Direct Preference Optimization (DPO): Learns preferences from multiple source LLMs by optimizing with best and worst response pairs.

This model was trained on a diverse dataset of 158,667 entries, covering instruction following, general conversation, mathematics, coding, and Chinese language tasks. The dataset construction involved sampling responses from multiple powerful source models and annotating them with an external reward model (ArmoRM) to create SFT and DPO pairs.

Performance Highlights

While the FuseChat-3.0 series showed significant improvements, particularly with the Llama-3.1-8B-Instruct target model achieving an average 6.8-point improvement across 14 benchmarks, the FuseChat-Llama-3.2-1B-Instruct specifically demonstrated notable gains:

  • AlpacaEval-2: Improved from 9.7% to 25.3%.
  • Arena-Hard: Improved from 5.1% to 8.6%.
  • HumanEval: Improved from 39.6% to 40.2%.

When to Use This Model

This model is particularly well-suited for applications requiring a compact yet capable LLM that excels in:

  • General Instruction Following
  • Conversational AI
  • Mathematical Problem Solving
  • Code Generation and Understanding

Its implicit fusion methodology allows it to leverage the strengths of much larger models, making it a strong candidate for resource-constrained environments where performance in these domains is critical.