Name: internlm/OREAL-7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: internlm

OREAL-7B: Advanced Mathematical Reasoning Model

OREAL-7B is a 7.6 billion parameter model from InternLM, specifically designed for advanced mathematical reasoning. It leverages a novel reinforcement learning framework called Outcome Reward-based reinforcement Learning (OREAL), which is optimized for tasks where only binary outcome rewards are available.

Key Capabilities & Innovations

High Mathematical Accuracy: OREAL-7B achieves a 94.0 pass@1 accuracy on MATH-500, demonstrating performance comparable to larger 32B models.
Novel RL Framework: The OREAL method utilizes best-of-N (BoN) sampling for behavior cloning and reshapes negative sample rewards for gradient consistency. It also incorporates an on-policy token-level reward model to address sparse rewards in long chain-of-thought reasoning.
Competitive Benchmarking: The model shows strong performance across various mathematical benchmarks, including AIME2024, AIME2025-I, LiveMath, and Olympiad, often outperforming other 7B models and competing with 32B models.
Systematic Reasoning: The model is guided by a detailed system prompt that encourages deep understanding, multi-angle analysis, systematic thinking, rigorous proof, and repeated verification, mimicking an expert mathematician's thought process.

Ideal Use Cases

Mathematical Problem Solving: Excels in solving complex mathematical problems, particularly those found in competitions.
Research in Mathematical AI: Useful for researchers exploring advanced reinforcement learning techniques for reasoning tasks.
Educational Tools: Can be integrated into systems requiring precise mathematical explanations and step-by-step reasoning.

Overview

OREAL-7B: Advanced Mathematical Reasoning Model

Key Capabilities & Innovations

Ideal Use Cases

Full Model Card (README)