FuseAI/FuseChat-Qwen-2.5-7B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Nov 11, 2024Architecture:Transformer0.0K Cold

FuseAI/FuseChat-Qwen-2.5-7B-SFT is a 7.6 billion parameter Qwen-2.5-based language model developed by FuseAI, enhanced through implicit model fusion. This model integrates capabilities from larger LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, Llama-3.1-70B-Instruct) using a two-stage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) pipeline. It is designed to improve performance in general conversation, instruction following, mathematics, and coding tasks, leveraging a 131072 token context length.

Loading preview...

FuseChat-Qwen-2.5-7B-SFT: Implicit Model Fusion

FuseChat-Qwen-2.5-7B-SFT is a 7.6 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. It utilizes an innovative "implicit model fusion" (IMF) approach to transfer capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct) into a smaller target model like Qwen-2.5-7B-Instruct.

Key Capabilities & Training:

  • Implicit Model Fusion (IMF): Unlike previous explicit fusion methods, IMF enhances a single LLM by implicitly learning from robust open-source LLMs through preference optimization.
  • Two-Stage Training: The model undergoes a Supervised Fine-Tuning (SFT) stage to reduce distribution discrepancies, followed by a Direct Preference Optimization (DPO) stage to learn preferences from multiple source LLMs.
  • Comprehensive Dataset: Training data includes a diverse mix of instruction following, general conversation, mathematics, coding, and Chinese language tasks, sourced from datasets like UltraFeedback, OpenMathInstruct-2, and LeetCode.
  • Preference Optimization: DPO leverages best and worst response pairs generated by source models, annotated using an external reward model (ArmoRM), to optimize the target model's performance.

Performance Highlights:

While the Llama-3.1-8B-Instruct variant showed significant gains, FuseChat-Qwen-2.5-7B-SFT also demonstrates improvements. For instance, it achieved 63.6% on AlpacaEval-2 and 61.4% on Arena-Hard, indicating strong instruction-following capabilities. It also shows competitive performance in mathematics and coding, with an average score of 52.9% across 14 benchmarks.

Use Cases:

This model is well-suited for applications requiring strong performance in:

  • General conversational AI
  • Complex instruction following
  • Mathematical problem-solving
  • Code generation and understanding
  • Multilingual tasks, particularly Chinese language processing.