FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview Overview
This model, developed by FuseAI, is a 32.8 billion parameter language model focused on enhancing System-II reasoning capabilities through advanced model fusion. It specifically utilizes a 'Long-Short Reasoning Merging' approach, combining the strengths of long-CoT (Chain-of-Thought) and short-CoT LLMs. The source models integrated are DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B-Instruct.
Key Capabilities & Performance
- Enhanced Reasoning: Designed to improve reasoning in both long and short reasoning processes.
- Mathematics: Achieves 68.6 Pass@1 and 83.3 Cons@32 on AIME24, and 94.6 on MATH500, demonstrating significant improvements over its constituent models.
- Scientific Reasoning: Scores 55.1 on GPQA-Diamond and 68.6 on MMLU-Pro.
- Code Reasoning: While not the primary coder variant, it contributes to the FuseO1 family's overall code reasoning capabilities.
Unique Approach
This model is part of FuseAI's initial endeavor to fuse multiple open-source LLMs using their advanced SCE merging methodologies. The goal is to consolidate distinct knowledge and strengths from various reasoning LLMs into a single, unified model with robust System-II reasoning abilities across mathematics, coding, and science domains.
When to Use This Model
- Complex Reasoning Tasks: Ideal for applications requiring robust step-by-step reasoning, especially in mathematical problem-solving.
- Instruction Following: Benefits from the instruction-tuned component, making it suitable for general instruction-following tasks.
- Research & Development: Useful for exploring the benefits of model fusion for enhanced reasoning.