Name: qihoo360/TinyR1-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: qihoo360

TinyR1-32B: Balanced Reasoning, Alignment, and Safety

TinyR1-32B, developed by qihoo360, is a 32.8 billion parameter language model designed to overcome the traditional trade-off between helpfulness and harmlessness in LLMs. It introduces a novel Control Token method that allows the model to dynamically adjust its behavior based on task type, ensuring a balanced coexistence of reasoning ability, safety, and alignment.

Key Capabilities & Performance

Enhanced Reasoning: Achieves 90.9 on AIME24 and 82.7 on AIME25, demonstrating 94% of DeepSeek-R1-0528's performance in mathematics, science, and coding reasoning tasks.
Superior Alignment: Scores 89.2 on IFEval (Prompt Strict), significantly outperforming DeepSeek-R1-0528's 80.9.
Constructive Safety: Attains a Constructive Safety score of nearly 90, far surpassing other large open-source models by providing positive safety guidance rather than simple refusal. This includes 89.5 on OpenSource and 86.5 on InHouse safety benchmarks.
Efficient Training: Achieved comprehensive improvements after only 20,000 high-quality fine-tuning samples and three rounds of SFT training.

Good For

Applications requiring a strong balance between complex reasoning and robust safety protocols.
Tasks where precise instruction-following is critical, such as IFEval questions using an "Adherence mode: Strict adherence" system prompt.
Scenarios demanding constructive and positive safety responses, activated via a "Safety Mode: Positive" system prompt.
Mathematical and scientific problem-solving, leveraging a "Please reason step by step, and put your final answer within \boxed{}" system prompt.

Overview

TinyR1-32B: Balanced Reasoning, Alignment, and Safety

Key Capabilities & Performance

Good For

Full Model Card (README)