Name: internlm/OREAL-DeepSeek-R1-Distill-Qwen-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: internlm

Overview

The internlm/OREAL-DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter language model from the OREAL series, developed by InternLM. It is specifically designed for advanced mathematical reasoning tasks, leveraging a novel reinforcement learning framework called Outcome REwArd-based reinforcement Learning (OREAL). This framework is tailored for scenarios where only binary outcome rewards are available, addressing the challenge of sparse rewards in long chain-of-thought reasoning by incorporating an on-policy token-level reward model.

Key Capabilities & Performance

Exceptional Mathematical Reasoning: Achieves 94.0 pass@1 accuracy on MATH-500, demonstrating performance comparable to previous 32B models in mathematical problem-solving.
Reinforcement Learning Optimization: Utilizes a unique RL approach with best-of-N (BoN) sampling and reshaped negative sample rewards for gradient consistency.
Sparse Reward Handling: Employs an on-policy token-level reward model to identify key tokens in reasoning trajectories, crucial for complex mathematical proofs.
Competitive Benchmarks: Outperforms many 7B and some 32B models on various mathematical benchmarks, including AIME2024, AIME2025-I, LiveMath, and Olympiad.

Use Cases

Competitive Mathematics: Ideal for solving problems encountered in mathematical competitions.
Advanced Problem Solving: Suitable for applications requiring rigorous logical deduction and multi-step mathematical reasoning.
Research in RL for Reasoning: Provides a strong baseline and methodology for further research into reinforcement learning for complex reasoning tasks.

Overview

Overview

Key Capabilities & Performance

Use Cases

Full Model Card (README)