FuseAI/FuseChat-7B-VaRM

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 26, 2024License:apache-2.0Architecture:Transformer0.1K Open Weights Cold

FuseAI/FuseChat-7B-VaRM is a 7 billion parameter chat language model developed by Fanqi Wan, Ziyi Yang, Longguang Zhong, Xiaojun Quan, Xinting Huang, and Wei Bi from Sun Yat-sen University. It is created through a novel knowledge fusion and merging strategy (VaRM) that integrates the strengths of three diverse chat LLMs: Nous-Hermes-2-Mixtral-8x7B, Nous-Hermes-2-SOLAR-10.7B, and OpenChat-3.5-7B. This model achieves an MT-Bench score of 8.22, outperforming many 7B and 34B models and approaching larger models like Mixtral-8x7B-Instruct, making it suitable for general conversational AI tasks.

Loading preview...

Overview

FuseChat-7B-VaRM is a 7 billion parameter chat model developed by Fanqi Wan et al. from Sun Yat-sen University, designed to integrate the collective knowledge and individual strengths of multiple chat LLMs. It employs a unique "fuse-then-merge" strategy, which involves pairwise knowledge fusion via lightweight fine-tuning, followed by a novel parameter merging method called VaRM (Variation Ratio of Parameter Matrices).

Key Capabilities & Differentiators

  • Knowledge Fusion: Integrates knowledge from diverse source models (Nous-Hermes-2-Mixtral-8x7B, Nous-Hermes-2-SOLAR-10.7B, OpenChat-3.5-7B) into a single, more powerful 7B model.
  • Memory Efficiency: Unlike Mixture of Experts (MoEs) which require loading multiple experts, FuseChat-7B-VaRM integrates multiple LLMs into a single model without additional memory requirements during inference.
  • Strong Performance: Achieves an MT-Bench score of 8.22, surpassing models like Starling-7B and Yi-34B-Chat, and even outperforming GPT-3.5 (March) and Claude-2.1.
  • Flexible Merging: The framework supports plug-and-play fusion of new source LLMs, allowing for continuous integration and improvement.

Benchmarks

FuseChat-7B-VaRM demonstrates competitive performance across various benchmarks:

  • MT-Bench: 8.22
  • Open LLM Leaderboard Average: 66.52
    • AI2 Reasoning Challenge (25-Shot): 62.88
    • HellaSwag (10-Shot): 84.25
    • MMLU (5-Shot): 63.71
    • TruthfulQA (0-shot): 45.67
    • Winogrande (5-shot): 79.16
    • GSM8k (5-shot): 63.46

Use Cases

This model is well-suited for general conversational AI applications, instruction-following, and tasks requiring robust reasoning, given its strong performance on MT-Bench and various academic benchmarks.