internlm/OREAL-32B
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Feb 10, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

OREAL-32B is a 32 billion parameter mathematical reasoning model developed by InternLM, trained using Outcome Reward-based Reinforcement Learning (OREAL). This novel RL framework, designed for tasks with binary outcome rewards, enables OREAL-32B to achieve 95.0 pass@1 accuracy on MATH-500, surpassing previous distillation-trained 32B models. It is specifically optimized for complex mathematical problem-solving and rigorous reasoning.

Loading preview...