OrionLLM/GRM-7b: A Reasoning-Focused 7B Model
GRM-7b is a 7 billion parameter model developed by OrionLLM, engineered with a primary focus on enhancing multi-domain reasoning capabilities. It excels across diverse areas including mathematics, logic, coding, and general problem-solving, making it a versatile tool for complex analytical tasks.
Key Capabilities
- Dedicated Reasoning Behavior: Optimized for general tasks requiring stepwise problem-solving and improved consistency in outputs.
- Strong 7B-Scale Performance: Offers practical performance suitable for local inference and experimentation, balancing capability with accessibility.
- Multi-Domain Mixture: Trained on a diverse dataset incorporating reasoning, code, math, and medical reasoning data, broadening its applicability.
- Fine-Tune Friendly: Designed as an ideal starting point for custom Supervised Fine-Tuning (SFT), Grouped Reinforcement Learning from Human Feedback (GRPO), or Direct Preference Optimization (DPO) pipelines.
Benchmarks
GRM-7b demonstrates strong performance across various reasoning and coding benchmarks, often outperforming other models in its class. Notably, it achieves 69.0 on AIME24, 53.3 on AIME25, 93.5 on AMC23, and 90.0 on MATH500, highlighting its robust analytical prowess. It also shows competitive results in coding challenges like CodeElo and CodeForces, and strong performance in GPQA-D and JEEBench.
Good For
- Developers needing a reliable 7B model for general reasoning tasks.
- Researchers and practitioners looking for a solid base model for further fine-tuning on specific reasoning-intensive applications.
- Use cases requiring multi-domain problem-solving, including mathematical, logical, and coding challenges.