AReaL-boba-SFT-32B: High-Performance Mathematical Reasoning
AReaL-boba-SFT-32B is a 32.8 billion parameter model developed by inclusionAI, distinguished by its strong performance in mathematical reasoning tasks. This model is part of the AReaL v0.2 (boba) release, which focuses on enhancing both model performance and training efficiency.
Key Capabilities & Features
- Competitive Mathematical Reasoning: Achieves scores of 78.8 on AIME 2024 and 62.1 on AIME 2025, demonstrating robust problem-solving abilities in complex mathematical challenges.
- Efficient Training: Developed at an extremely low cost, matching the performance of models like QwQ-32B on AIME2024 using only 200 high-quality data samples for Supervised Fine-Tuning (SFT).
- Optimized Training System: Leverages system optimizations from AReaL v0.2, including SGLang support for improved throughput, efficient handling of variable-length sequences, and high-performance data transfer for large-scale training.
- Long Context Handling: Features a 131072 token context length, suitable for intricate reasoning problems requiring extensive input.
What Makes This Model Different?
AReaL-boba-SFT-32B stands out due to its ability to achieve highly competitive mathematical reasoning performance with a significantly smaller and more focused training dataset. The underlying AReaL framework emphasizes stable and fast training, incorporating advanced techniques like token-level loss normalization and a refined reward function for RL training (though this specific model is SFT). Its development showcases that high-quality data and system optimizations can yield strong results comparable to larger or more extensively trained models, particularly for specialized tasks like mathematical problem-solving.
Recommended Use Cases
- Mathematical Problem Solving: Ideal for applications requiring advanced mathematical reasoning, such as competitive programming, scientific research, or educational tools.
- Benchmarking and Research: Useful for researchers exploring efficient training methodologies and the impact of high-quality, curated datasets on model performance.
- Applications requiring long context: Its large context window makes it suitable for tasks where detailed information or multi-step reasoning is crucial.