TinyR1-32B-Preview is a 32 billion parameter reasoning model developed by qihoo360, based on the Deepseek-R1-Distill-Qwen-32B architecture. It is specifically optimized for complex reasoning tasks across mathematics, coding, and science domains, demonstrating performance in math that nearly matches larger models. This model was created by fine-tuning and merging domain-specific models to achieve strong overall performance in these analytical areas.
Overview
TinyR1-32B-Preview: A Specialized Reasoning Model
TinyR1-32B-Preview is qihoo360's first-generation reasoning model, designed to excel in specific analytical domains. This 32 billion parameter model is built upon the Deepseek-R1-Distill-Qwen-32B architecture and has been fine-tuned using the 360-LLaMA-Factory framework.
Key Capabilities & Features
- Domain-Specific Optimization: Achieves strong performance in Mathematics, Coding, and Science through a unique training approach involving supervised fine-tuning (SFT) on specialized datasets and subsequent model merging.
- Competitive Reasoning Performance: Outperforms the 70B parameter Deepseek-R1-Distill-Llama-70B in mathematics (78.1 on AIME 2024) and shows competitive results in coding (61.6 on LiveCodeBench) and science (65.0 on GPQA-Diamond).
- Training Data: Utilizes Chain-of-Thought (CoT) trajectories from datasets like OpenR1-Math-220k (math), OpenThoughts-114k (coding & science), and simplescaling/data_ablation_full59K (science).
- Open-Sourced Resources: The training dataset and the full training and evaluation pipeline are open-sourced, along with a technical report available on arXiv.
Good For
- Complex Reasoning Tasks: Ideal for applications requiring advanced problem-solving in mathematics, code generation, and scientific inquiry.
- Research and Development: Serves as an experimental research model for advancing AI reasoning capabilities, particularly for those interested in its unique distillation and merging methodology.
- Benchmarking: Useful for evaluating and comparing reasoning performance against other models in specialized domains.