FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 22, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview is a 32.8 billion parameter language model developed by FuseAI, designed to enhance System-II reasoning capabilities through model fusion. This model integrates DeepSeek-R1-Distill-Qwen-32B and Qwen2.5-32B-Coder using a Long-Short Reasoning Merging approach. It excels in code reasoning tasks, demonstrating improved performance on benchmarks like LiveCodeBench.

Loading preview...

FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview Overview

This model, developed by FuseAI, is part of the FuseO1-Preview series, an initiative focused on enhancing System-II reasoning in large language models (LLMs) through innovative model fusion techniques. Specifically, FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview is a 32.8 billion parameter model created using a Long-Short Reasoning Merging approach, integrating the strengths of deepseek-ai/DeepSeek-R1-Distill-Qwen-32B and Qwen/Qwen2.5-32B-Coder.

Key Capabilities

  • Enhanced System-II Reasoning: Designed to improve complex, step-by-step reasoning, particularly in technical domains.
  • Code Reasoning: Demonstrates strong performance in code-related tasks, achieving 56.4 on LiveCodeBench and 24.2 on LiveCodeBench-Hard, outperforming its constituent models and some OpenAI o1-preview variants.
  • Model Fusion: Utilizes advanced SCE merging methodologies to combine distinct knowledge and strengths from multiple reasoning LLMs into a unified model.

Good For

  • Code Generation and Problem Solving: Ideal for applications requiring robust code reasoning and problem-solving capabilities.
  • Complex Technical Tasks: Suitable for scenarios demanding strong analytical and logical deduction in mathematics, coding, and scientific domains.
  • Developers Seeking Optimized Reasoning: A strong candidate for those looking for models with improved performance in both long and short reasoning processes, especially in coding contexts.