qihoo360/TinyR1-32B
TinyR1-32B by qihoo360 is a 32.8 billion parameter language model that utilizes a Control Token method to balance helpfulness and harmlessness. It achieves strong reasoning, instruction-following, and safety performance, surpassing Qwen3-32B and DeepSeek-R1-0528 in key metrics. This model is optimized for tasks requiring robust reasoning, precise instruction adherence, and constructive safety guidance, supporting a 131072 token context length.
Loading preview...
TinyR1-32B: Balanced Reasoning, Alignment, and Safety
TinyR1-32B, developed by qihoo360, is a 32.8 billion parameter language model designed to overcome the traditional trade-off between helpfulness and harmlessness in LLMs. It introduces a novel Control Token method that allows the model to dynamically adjust its behavior based on task type, ensuring a balanced coexistence of reasoning ability, safety, and alignment.
Key Capabilities & Performance
- Enhanced Reasoning: Achieves 90.9 on AIME24 and 82.7 on AIME25, demonstrating 94% of DeepSeek-R1-0528's performance in mathematics, science, and coding reasoning tasks.
- Superior Alignment: Scores 89.2 on IFEval (Prompt Strict), significantly outperforming DeepSeek-R1-0528's 80.9.
- Constructive Safety: Attains a Constructive Safety score of nearly 90, far surpassing other large open-source models by providing positive safety guidance rather than simple refusal. This includes 89.5 on OpenSource and 86.5 on InHouse safety benchmarks.
- Efficient Training: Achieved comprehensive improvements after only 20,000 high-quality fine-tuning samples and three rounds of SFT training.
Good For
- Applications requiring a strong balance between complex reasoning and robust safety protocols.
- Tasks where precise instruction-following is critical, such as IFEval questions using an "Adherence mode: Strict adherence" system prompt.
- Scenarios demanding constructive and positive safety responses, activated via a "Safety Mode: Positive" system prompt.
- Mathematical and scientific problem-solving, leveraging a "Please reason step by step, and put your final answer within \boxed{}" system prompt.