internlm/OREAL-DeepSeek-R1-Distill-Qwen-7B
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Feb 10, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

The internlm/OREAL-DeepSeek-R1-Distill-Qwen-7B is a 7.6 billion parameter mathematical reasoning model developed by InternLM, fine-tuned using Outcome Reward-based Reinforcement Learning (OREAL). This model excels in complex mathematical problem-solving, achieving 94.0 pass@1 accuracy on MATH-500, matching larger 32B models. It is specifically optimized for tasks where only binary outcome rewards are available, making it highly effective for competitive mathematics and rigorous logical reasoning.

Loading preview...