Overview
TinyR1-32B-Preview: A Specialized Reasoning Model
TinyR1-32B-Preview is qihoo360's first-generation reasoning model, designed to excel in specific analytical domains. This 32 billion parameter model is built upon the Deepseek-R1-Distill-Qwen-32B architecture and has been fine-tuned using the 360-LLaMA-Factory framework.
Key Capabilities & Features
- Domain-Specific Optimization: Achieves strong performance in Mathematics, Coding, and Science through a unique training approach involving supervised fine-tuning (SFT) on specialized datasets and subsequent model merging.
- Competitive Reasoning Performance: Outperforms the 70B parameter Deepseek-R1-Distill-Llama-70B in mathematics (78.1 on AIME 2024) and shows competitive results in coding (61.6 on LiveCodeBench) and science (65.0 on GPQA-Diamond).
- Training Data: Utilizes Chain-of-Thought (CoT) trajectories from datasets like OpenR1-Math-220k (math), OpenThoughts-114k (coding & science), and simplescaling/data_ablation_full59K (science).
- Open-Sourced Resources: The training dataset and the full training and evaluation pipeline are open-sourced, along with a technical report available on arXiv.
Good For
- Complex Reasoning Tasks: Ideal for applications requiring advanced problem-solving in mathematics, code generation, and scientific inquiry.
- Research and Development: Serves as an experimental research model for advancing AI reasoning capabilities, particularly for those interested in its unique distillation and merging methodology.
- Benchmarking: Useful for evaluating and comparing reasoning performance against other models in specialized domains.