FuseAI/FuseChat-Qwen-2.5-7B-Instruct

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 12, 2024Architecture:Transformer0.0K Cold

FuseAI/FuseChat-Qwen-2.5-7B-Instruct is a 7.6 billion parameter instruction-tuned language model developed by FuseAI, based on the Qwen 2.5 architecture. It is part of the FuseChat-3.0 series, which employs an implicit model fusion (IMF) technique to integrate the strengths of larger source LLMs into more compact target models. This model excels in general conversation, instruction following, mathematics, and coding tasks, leveraging a two-stage training pipeline of Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Loading preview...

FuseChat-Qwen-2.5-7B-Instruct: Implicit Model Fusion

FuseChat-Qwen-2.5-7B-Instruct is a 7.6 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. This model utilizes an innovative implicit model fusion (IMF) approach to transfer capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct) into a smaller Qwen-2.5-7B-Instruct target model.

Key Capabilities & Training

The IMF process involves a two-stage training pipeline:

  • Supervised Fine-Tuning (SFT): Mitigates distribution discrepancies by fine-tuning on high-quality responses from source models.
  • Direct Preference Optimization (DPO): Learns preferences from multiple source LLMs using best and worst response pairs, further enhancing performance.

The model was trained on a diverse dataset of 158,667 entries, covering instruction following, general conversation, mathematics, coding, and Chinese language tasks. This includes data from UltraFeedback, OpenMathInstruct-2, and LeetCode, with responses sampled from the larger source models and annotated using an external reward model like ArmoRM.

Performance Highlights

FuseChat-Qwen-2.5-7B-Instruct demonstrates significant improvements across various benchmarks, particularly in instruction following. For instance, it achieved 63.6% on AlpacaEval-2 and 61.4% on Arena-Hard, marking substantial gains over the base Qwen-2.5-7B-Instruct model. While showing strong performance in areas like MT-Bench and AMC 23, it also maintains competitive scores in mathematics and coding, with an overall average improvement across 14 benchmarks.