FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 20, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview is a 32.8 billion parameter language model developed by FuseAI, leveraging innovative System-II reasoning fusion techniques. This model is a result of 'Long-Short Reasoning Merging' of DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B-Instruct, designed to enhance reasoning capabilities in both long and short reasoning processes. It demonstrates strong performance in mathematics, coding, and scientific reasoning tasks, particularly excelling in long reasoning benchmarks.

Loading preview...

FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview Overview

This model, developed by FuseAI, is a 32.8 billion parameter language model focused on enhancing System-II reasoning capabilities through advanced model fusion. It specifically utilizes a 'Long-Short Reasoning Merging' approach, combining the strengths of long-CoT (Chain-of-Thought) and short-CoT LLMs. The source models integrated are DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B-Instruct.

Key Capabilities & Performance

  • Enhanced Reasoning: Designed to improve reasoning in both long and short reasoning processes.
  • Mathematics: Achieves 68.6 Pass@1 and 83.3 Cons@32 on AIME24, and 94.6 on MATH500, demonstrating significant improvements over its constituent models.
  • Scientific Reasoning: Scores 55.1 on GPQA-Diamond and 68.6 on MMLU-Pro.
  • Code Reasoning: While not the primary coder variant, it contributes to the FuseO1 family's overall code reasoning capabilities.

Unique Approach

This model is part of FuseAI's initial endeavor to fuse multiple open-source LLMs using their advanced SCE merging methodologies. The goal is to consolidate distinct knowledge and strengths from various reasoning LLMs into a single, unified model with robust System-II reasoning abilities across mathematics, coding, and science domains.

When to Use This Model

  • Complex Reasoning Tasks: Ideal for applications requiring robust step-by-step reasoning, especially in mathematical problem-solving.
  • Instruction Following: Benefits from the instruction-tuned component, making it suitable for general instruction-following tasks.
  • Research & Development: Useful for exploring the benefits of model fusion for enhanced reasoning.