FuseChat-Qwen-2.5-7B-Instruct: Implicit Model Fusion
FuseChat-Qwen-2.5-7B-Instruct is a 7.6 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. This model utilizes an innovative implicit model fusion (IMF) approach to transfer capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct) into a smaller Qwen-2.5-7B-Instruct target model.
Key Capabilities & Training
The IMF process involves a two-stage training pipeline:
- Supervised Fine-Tuning (SFT): Mitigates distribution discrepancies by fine-tuning on high-quality responses from source models.
- Direct Preference Optimization (DPO): Learns preferences from multiple source LLMs using best and worst response pairs, further enhancing performance.
The model was trained on a diverse dataset of 158,667 entries, covering instruction following, general conversation, mathematics, coding, and Chinese language tasks. This includes data from UltraFeedback, OpenMathInstruct-2, and LeetCode, with responses sampled from the larger source models and annotated using an external reward model like ArmoRM.
Performance Highlights
FuseChat-Qwen-2.5-7B-Instruct demonstrates significant improvements across various benchmarks, particularly in instruction following. For instance, it achieved 63.6% on AlpacaEval-2 and 61.4% on Arena-Hard, marking substantial gains over the base Qwen-2.5-7B-Instruct model. While showing strong performance in areas like MT-Bench and AMC 23, it also maintains competitive scores in mathematics and coding, with an overall average improvement across 14 benchmarks.