Name: FuseAI/FuseChat-Qwen-2.5-7B-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: FuseAI

FuseChat-Qwen-2.5-7B-SFT: Implicit Model Fusion

FuseChat-Qwen-2.5-7B-SFT is a 7.6 billion parameter model from the FuseChat-3.0 series, developed by FuseAI. It utilizes an innovative "implicit model fusion" (IMF) approach to transfer capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct) into a smaller target model like Qwen-2.5-7B-Instruct.

Key Capabilities & Training:

Implicit Model Fusion (IMF): Unlike previous explicit fusion methods, IMF enhances a single LLM by implicitly learning from robust open-source LLMs through preference optimization.
Two-Stage Training: The model undergoes a Supervised Fine-Tuning (SFT) stage to reduce distribution discrepancies, followed by a Direct Preference Optimization (DPO) stage to learn preferences from multiple source LLMs.
Comprehensive Dataset: Training data includes a diverse mix of instruction following, general conversation, mathematics, coding, and Chinese language tasks, sourced from datasets like UltraFeedback, OpenMathInstruct-2, and LeetCode.
Preference Optimization: DPO leverages best and worst response pairs generated by source models, annotated using an external reward model (ArmoRM), to optimize the target model's performance.

Performance Highlights:

While the Llama-3.1-8B-Instruct variant showed significant gains, FuseChat-Qwen-2.5-7B-SFT also demonstrates improvements. For instance, it achieved 63.6% on AlpacaEval-2 and 61.4% on Arena-Hard, indicating strong instruction-following capabilities. It also shows competitive performance in mathematics and coding, with an average score of 52.9% across 14 benchmarks.

Use Cases:

This model is well-suited for applications requiring strong performance in:

General conversational AI
Complex instruction following
Mathematical problem-solving
Code generation and understanding
Multilingual tasks, particularly Chinese language processing.

Overview

FuseChat-Qwen-2.5-7B-SFT: Implicit Model Fusion

Key Capabilities & Training:

Performance Highlights:

Use Cases:

Full Model Card (README)