FuseAI/FuseChat-Qwen-2.5-7B-SFT
FuseAI/FuseChat-Qwen-2.5-7B-SFT is a 7.6 billion parameter Qwen-2.5-based language model developed by FuseAI, enhanced through implicit model fusion. This model integrates capabilities from larger LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, Llama-3.1-70B-Instruct) using a two-stage Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) pipeline. It is designed to improve performance in general conversation, instruction following, mathematics, and coding tasks, leveraging a 131072 token context length.
Loading preview...
FuseChat-Qwen-2.5-7B-SFT: Implicit Model Fusion
FuseChat-Qwen-2.5-7B-SFT is a 7.6 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. It utilizes an innovative "implicit model fusion" (IMF) approach to transfer capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct) into a smaller target model like Qwen-2.5-7B-Instruct.
Key Capabilities & Training:
- Implicit Model Fusion (IMF): Unlike previous explicit fusion methods, IMF enhances a single LLM by implicitly learning from robust open-source LLMs through preference optimization.
- Two-Stage Training: The model undergoes a Supervised Fine-Tuning (SFT) stage to reduce distribution discrepancies, followed by a Direct Preference Optimization (DPO) stage to learn preferences from multiple source LLMs.
- Comprehensive Dataset: Training data includes a diverse mix of instruction following, general conversation, mathematics, coding, and Chinese language tasks, sourced from datasets like UltraFeedback, OpenMathInstruct-2, and LeetCode.
- Preference Optimization: DPO leverages best and worst response pairs generated by source models, annotated using an external reward model (ArmoRM), to optimize the target model's performance.
Performance Highlights:
While the Llama-3.1-8B-Instruct variant showed significant gains, FuseChat-Qwen-2.5-7B-SFT also demonstrates improvements. For instance, it achieved 63.6% on AlpacaEval-2 and 61.4% on Arena-Hard, indicating strong instruction-following capabilities. It also shows competitive performance in mathematics and coding, with an average score of 52.9% across 14 benchmarks.
Use Cases:
This model is well-suited for applications requiring strong performance in:
- General conversational AI
- Complex instruction following
- Mathematical problem-solving
- Code generation and understanding
- Multilingual tasks, particularly Chinese language processing.