internlm/OREAL-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 10, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The internlm/OREAL-7B is a 7.6 billion parameter mathematical reasoning model developed by InternLM, trained using Outcome Reward-based reinforcement Learning (OREAL). This novel RL framework is designed for tasks with binary outcome rewards, enabling the model to achieve 94.0 pass@1 accuracy on MATH-500, matching the performance of previous 32B models. It excels in complex mathematical problem-solving, particularly in competitive math benchmarks.

Loading preview...

OREAL-7B: Advanced Mathematical Reasoning Model

OREAL-7B is a 7.6 billion parameter model from InternLM, specifically designed for advanced mathematical reasoning. It leverages a novel reinforcement learning framework called Outcome Reward-based reinforcement Learning (OREAL), which is optimized for tasks where only binary outcome rewards are available.

Key Capabilities & Innovations

  • High Mathematical Accuracy: OREAL-7B achieves a 94.0 pass@1 accuracy on MATH-500, demonstrating performance comparable to larger 32B models.
  • Novel RL Framework: The OREAL method utilizes best-of-N (BoN) sampling for behavior cloning and reshapes negative sample rewards for gradient consistency. It also incorporates an on-policy token-level reward model to address sparse rewards in long chain-of-thought reasoning.
  • Competitive Benchmarking: The model shows strong performance across various mathematical benchmarks, including AIME2024, AIME2025-I, LiveMath, and Olympiad, often outperforming other 7B models and competing with 32B models.
  • Systematic Reasoning: The model is guided by a detailed system prompt that encourages deep understanding, multi-angle analysis, systematic thinking, rigorous proof, and repeated verification, mimicking an expert mathematician's thought process.

Ideal Use Cases

  • Mathematical Problem Solving: Excels in solving complex mathematical problems, particularly those found in competitions.
  • Research in Mathematical AI: Useful for researchers exploring advanced reinforcement learning techniques for reasoning tasks.
  • Educational Tools: Can be integrated into systems requiring precise mathematical explanations and step-by-step reasoning.

Popular Sampler Settings

Top 3 parameter combinations used by Featherless users for this model. Click a tab to see each config.

temperature
top_p
top_k
frequency_penalty
presence_penalty
repetition_penalty
min_p