FuseAI/FuseChat-Llama-3.1-8B-SFT

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Nov 20, 2024Architecture:Transformer0.0K Cold

FuseAI/FuseChat-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned language model developed by FuseAI, built upon the Llama-3.1-8B-Instruct architecture. It leverages an implicit model fusion (IMF) approach, combining Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to integrate capabilities from larger source LLMs like Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, and Llama-3.1-70B-Instruct. This model excels in general conversation, instruction following, mathematics, and coding, demonstrating an average performance improvement of 6.8 points across 14 benchmarks and significant gains on AlpacaEval-2 and Arena-Hard.

Loading preview...

FuseChat-Llama-3.1-8B-SFT: Implicit Model Fusion for Enhanced Performance

FuseChat-Llama-3.1-8B-SFT is an 8 billion parameter instruction-tuned model developed by FuseAI, designed to enhance the capabilities of smaller LLMs by implicitly learning from multiple robust open-source LLMs. This model utilizes a novel implicit model fusion (IMF) method, a two-stage training pipeline that includes Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO).

Key Capabilities

  • Enhanced Instruction Following: Achieved significant improvements of 37.1 points on AlpacaEval-2 and 30.1 points on Arena-Hard compared to its base Llama-3.1-8B-Instruct.
  • Broad Task Proficiency: Demonstrates substantial gains across general conversation, mathematics, and coding tasks.
  • Knowledge Integration: Effectively transfers capabilities from powerful source LLMs (Gemma-2-27B-It, Mistral-Large-Instruct-2407, Qwen-2.5-72B-Instruct, Llama-3.1-70B-Instruct) into a more compact 8B parameter model.
  • Competitive Performance: Outperforms AllenAI's Llama-3.1-Tulu-3-8B on most benchmarks, with an average improvement of 6.8 points across 14 diverse benchmarks.

Good for

  • General-purpose conversational AI: Excels in instruction following and general conversation scenarios.
  • Mathematical problem-solving: Shows strong performance in mathematics benchmarks.
  • Code generation and understanding: Improved capabilities in coding tasks.
  • Resource-efficient deployment: Offers enhanced performance in a more compact 8B parameter size, making it suitable for applications where larger models are impractical.
  • Developers seeking robust, instruction-tuned models: Provides a strong foundation for various AI applications requiring high-quality responses and adherence to instructions.