FuseAI/FuseChat-Llama-3.1-8B-SFT
FuseAI/FuseChat-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned language model developed by FuseAI, built upon the Llama-3.1-8B-Instruct architecture. It leverages an implicit model fusion (IMF) approach, combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to integrate capabilities from larger source LLMs like Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. This model excels in general conversation, instruction following, mathematics, and coding, demonstrating an average performance improvement of 6.8 points across 14 benchmarks and significant gains on AlpacaEval-2 and Arena-Hard.
Loading preview...
FuseChat-Llama-3.1-8B-SFT: Implicit Model Fusion for Enhanced Performance
FuseChat-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned model developed by FuseAI, designed to enhance the capabilities of smaller LLMs by implicitly learning from multiple robust open-source LLMs. This model utilizes a novel implicit model fusion (IMF) method, a two-stage training pipeline that includes Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).
Key Capabilities
- Enhanced Instruction Following: Achieved significant improvements of 37.1 points on AlpacaEval-2 and 30.1 points on Arena-Hard compared to its base Llama-3.1-8B-Instruct.
- Broad Task Proficiency: Demonstrates substantial gains across general conversation, mathematics, and coding tasks.
- Knowledge Integration: Effectively transfers capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, Llama-3.1-70B-Instruct) into a more compact 8B parameter model.
- Competitive Performance: Outperforms AllenAI's Llama-3.1-Tulu-3-8B on most benchmarks, with an average improvement of 6.8 points across 14 diverse benchmarks.
Good for
- General-purpose conversational AI: Excels in instruction following and general conversation scenarios.
- Mathematical problem-solving: Shows strong performance in mathematics benchmarks.
- Code generation and understanding: Improved capabilities in coding tasks.
- Resource-efficient deployment: Offers enhanced performance in a more compact 8B parameter size, making it suitable for applications where larger models are impractical.
- Developers seeking robust, instruction-tuned models: Provides a strong foundation for various AI applications requiring high-quality responses and adherence to instructions.