qihoo360/TinyR1-32B

TEXT GENERATIONConcurrency Cost:2Model Size:32.8BQuant:FP8Ctx Length:32kPublished:Sep 23, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

TinyR1-32B by qihoo360 is a 32.8 billion parameter language model that utilizes a Control Token method to balance helpfulness and harmlessness. It achieves strong reasoning, instruction-following, and safety performance, surpassing Qwen3-32B and DeepSeek-R1-0528 in key metrics. This model is optimized for tasks requiring robust reasoning, precise instruction adherence, and constructive safety guidance, supporting a 131072 token context length.

Loading preview...

TinyR1-32B: Balanced Reasoning, Alignment, and Safety

TinyR1-32B, developed by qihoo360, is a 32.8 billion parameter language model designed to overcome the traditional trade-off between helpfulness and harmlessness in LLMs. It introduces a novel Control Token method that allows the model to dynamically adjust its behavior based on task type, ensuring a balanced coexistence of reasoning ability, safety, and alignment.

Key Capabilities & Performance

  • Enhanced Reasoning: Achieves 90.9 on AIME24 and 82.7 on AIME25, demonstrating 94% of DeepSeek-R1-0528's performance in mathematics, science, and coding reasoning tasks.
  • Superior Alignment: Scores 89.2 on IFEval (Prompt Strict), significantly outperforming DeepSeek-R1-0528's 80.9.
  • Constructive Safety: Attains a Constructive Safety score of nearly 90, far surpassing other large open-source models by providing positive safety guidance rather than simple refusal. This includes 89.5 on OpenSource and 86.5 on InHouse safety benchmarks.
  • Efficient Training: Achieved comprehensive improvements after only 20,000 high-quality fine-tuning samples and three rounds of SFT training.

Good For

  • Applications requiring a strong balance between complex reasoning and robust safety protocols.
  • Tasks where precise instruction-following is critical, such as IFEval questions using an "Adherence mode: Strict adherence" system prompt.
  • Scenarios demanding constructive and positive safety responses, activated via a "Safety Mode: Positive" system prompt.
  • Mathematical and scientific problem-solving, leveraging a "Please reason step by step, and put your final answer within \boxed{}" system prompt.