Wanfq/FuseLLM-7B is a 7 billion parameter language model developed by Fanqi Wan and Tencent AI Lab, designed for knowledge fusion across diverse LLM architectures. It combines the capabilities of Llama-2-7B, OpenLLaMA-7B, and MPT-7B into a single, more potent model. FuseLLM-7B excels in general reasoning, commonsense reasoning, and code generation, outperforming its constituent models and other 7B alternatives on various benchmarks.
Loading preview...
Overview
Wanfq/FuseLLM-7B is a 7 billion parameter model developed by Fanqi Wan and Tencent AI Lab, focusing on knowledge fusion for large language models. Unlike traditional model ensemble or weight merging techniques, FuseLLM can combine multiple LLMs with diverse architectures into a single, more powerful target LLM. This is achieved by externalizing the collective knowledge and individual strengths of source LLMs (Llama-2-7B, OpenLLaMA-7B, and MPT-7B) and transferring them through lightweight continual training.
Key Capabilities
- Knowledge Fusion: Integrates capabilities from structurally diverse LLMs (e.g., Llama-2, OpenLLaMA, MPT) into one model.
- Enhanced Reasoning: Shows improved performance on Big-Bench Hard and CommonSense benchmarks (ARC-easy, ARC-challenge, BoolQ, HellaSwag, OpenBookQA) compared to its source models.
- Code Generation: Demonstrates better results on the MultiPL-E benchmark for multilingual programming.
- Text Generation: Achieves higher scores in tasks like TrivialQA, DROP, and LAMBADA.
- Instruction Following: Evaluated positively on the Vicuna Benchmark, indicating strong instruction-following abilities.
Good For
- Applications requiring a unified model that leverages the strengths of multiple distinct LLM architectures.
- Tasks demanding strong general reasoning and commonsense understanding.
- Code generation and various text generation applications where FuseLLM-7B shows competitive performance against other 7B models.