Overview
ZR1-1.5B: A Compact Reasoning Powerhouse
ZR1-1.5B is a 1.5 billion parameter model from Zyphra, specifically engineered for advanced reasoning in coding and mathematics. It leverages an extensive training regimen on verified problem sets, utilizing the PRIME (Process Reinforcement through IMplicit rEwards) online RL algorithm with process rewards. This approach, combined with iterative context lengthening, has resulted in a model that demonstrates exceptional performance, particularly for its size.
Key Capabilities & Performance
- Superior Coding Performance: ZR1-1.5B achieves 40% on Leetcode and 39.74% on LCB_generation, outperforming Llama-3.1-70B-Instruct on hard coding tasks and showing over 50% improvement compared to its base R1-Distill-1.5B model.
- Strong Mathematical Reasoning: The model scores 37.91% pass@1 accuracy on GPQA-Diamond and exhibits robust performance across various math evaluations like AIME, AMC, and MATH500.
- Efficient Training: Trained on a single 8xH100 node, it uses a unique training recipe involving PRIME + RLOO with token-level granularity and dynamic context lengthening up to 24k tokens.
- High Token Efficiency: LiveBench evaluations highlight the model's very token-efficient operation.
Ideal Use Cases
- Code Generation and Problem Solving: Excellent for tasks requiring accurate code generation and debugging, especially for competitive programming or complex algorithmic challenges.
- Mathematical Reasoning: Suitable for applications involving advanced mathematical problem-solving, proof generation, and quantitative analysis.
- Resource-Constrained Environments: Its small parameter count (1.5B) combined with high performance makes it ideal for deployment where computational resources are limited, offering a powerful solution without the overhead of much larger models.