internlm/OREAL-7B
The internlm/OREAL-7B is a 7.6 billion parameter mathematical reasoning model developed by InternLM, trained using Outcome Reward-based reinforcement Learning (OREAL). This novel RL framework is designed for tasks with binary outcome rewards, enabling the model to achieve 94.0 pass@1 accuracy on MATH-500, matching the performance of previous 32B models. It excels in complex mathematical problem-solving, particularly in competitive math benchmarks.
Loading preview...
OREAL-7B: Advanced Mathematical Reasoning Model
OREAL-7B is a 7.6 billion parameter model from InternLM, specifically designed for advanced mathematical reasoning. It leverages a novel reinforcement learning framework called Outcome Reward-based reinforcement Learning (OREAL), which is optimized for tasks where only binary outcome rewards are available.
Key Capabilities & Innovations
- High Mathematical Accuracy: OREAL-7B achieves a 94.0 pass@1 accuracy on MATH-500, demonstrating performance comparable to larger 32B models.
- Novel RL Framework: The OREAL method utilizes best-of-N (BoN) sampling for behavior cloning and reshapes negative sample rewards for gradient consistency. It also incorporates an on-policy token-level reward model to address sparse rewards in long chain-of-thought reasoning.
- Competitive Benchmarking: The model shows strong performance across various mathematical benchmarks, including AIME2024, AIME2025-I, LiveMath, and Olympiad, often outperforming other 7B models and competing with 32B models.
- Systematic Reasoning: The model is guided by a detailed system prompt that encourages deep understanding, multi-angle analysis, systematic thinking, rigorous proof, and repeated verification, mimicking an expert mathematician's thought process.
Ideal Use Cases
- Mathematical Problem Solving: Excels in solving complex mathematical problems, particularly those found in competitions.
- Research in Mathematical AI: Useful for researchers exploring advanced reinforcement learning techniques for reasoning tasks.
- Educational Tools: Can be integrated into systems requiring precise mathematical explanations and step-by-step reasoning.
Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.