FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview: System-II Reasoning Fusion
This model, developed by FuseAI, is part of the FuseO1-Preview series, an initiative focused on enhancing the System-II reasoning capabilities of large language models through innovative model fusion. Specifically, the FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview is a 32.8 billion parameter model created using a Long-Short Reasoning Merging approach. This technique combines LLMs that excel in long-chain-of-thought (CoT) reasoning with those optimized for short-CoT reasoning, aiming to achieve robust performance across diverse reasoning complexities.
Key Capabilities & Performance
- Enhanced Reasoning: Integrates knowledge and strengths from multiple open-source LLMs (DeepSeek-R1-Distill-Qwen-32B, QwQ-32B-Preview, Sky-T1-32B-Flash) to improve mathematical, coding, and scientific reasoning.
- Model Fusion (SCE Merging): Leverages advanced SCE merging methodologies to create a unified model with superior System-II reasoning abilities.
- Competitive Benchmarks: Achieves notable results on various benchmarks:
- Math Reasoning: On AIME24, it scores 72.9 Pass@1 and 86.7 Cons@32, outperforming several base models and approaching OpenAI o1's performance.
- Code Reasoning: Attains 58.2 on LiveCodeBench and 25.0 on LiveCodeBench-Hard, showing significant improvements over its constituent models.
- Scientific Reasoning: Scores 54.6 on GPQA-Diamond and 70.6 on MMLU-Pro.
Ideal Use Cases
- Complex Problem Solving: Suited for applications requiring detailed, multi-step reasoning in mathematics, science, and coding.
- Code Generation & Analysis: Its strong performance on LiveCodeBench suggests utility in code-related tasks.
- Research & Development: Provides a robust base for further exploration into model fusion and System-II reasoning enhancements.