FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Jan 24, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview is a 32.8 billion parameter language model developed by FuseAI, designed to enhance System-II reasoning capabilities through advanced model fusion techniques. This specific variant utilizes a Long-Short Reasoning Merging approach, integrating DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, and Sky-T1-32B-Flash to improve reasoning across both long and short reasoning processes. It demonstrates strong performance in mathematics, coding, and scientific reasoning tasks, particularly on benchmarks like AIME24 and LiveCodeBench.

Loading preview...

FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview: System-II Reasoning Fusion

This model, developed by FuseAI, is part of the FuseO1-Preview series, an initiative focused on enhancing the System-II reasoning capabilities of large language models through innovative model fusion. Specifically, the FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview is a 32.8 billion parameter model created using a Long-Short Reasoning Merging approach. This technique combines LLMs that excel in long-chain-of-thought (CoT) reasoning with those optimized for short-CoT reasoning, aiming to achieve robust performance across diverse reasoning complexities.

Key Capabilities & Performance

  • Enhanced Reasoning: Integrates knowledge and strengths from multiple open-source LLMs (DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, Sky-T1-32B-Flash) to improve mathematical, coding, and scientific reasoning.
  • Model Fusion (SCE Merging): Leverages advanced SCE merging methodologies to create a unified model with superior System-II reasoning abilities.
  • Competitive Benchmarks: Achieves notable results on various benchmarks:
    • Math Reasoning: On AIME24, it scores 72.9 Pass@1 and 86.7 Cons@32, outperforming several base models and approaching OpenAI o1's performance.
    • Code Reasoning: Attains 58.2 on LiveCodeBench and 25.0 on LiveCodeBench-Hard, showing significant improvements over its constituent models.
    • Scientific Reasoning: Scores 54.6 on GPQA-Diamond and 70.6 on MMLU-Pro.

Ideal Use Cases

  • Complex Problem Solving: Suited for applications requiring detailed, multi-step reasoning in mathematics, science, and coding.
  • Code Generation & Analysis: Its strong performance on LiveCodeBench suggests utility in code-related tasks.
  • Research & Development: Provides a robust base for further exploration into model fusion and System-II reasoning enhancements.