Overview
LLM360/guru-7B: Enhanced Reasoning Model
LLM360/guru-7B is a 7.6 billion parameter model built upon the Qwen2.5-7B base, distinguished by its specialized fine-tuning for advanced reasoning tasks. This model is a product of research detailed in the paper "Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective" (arXiv:2506.14965).
Key Capabilities & Performance
The Guru-7B model demonstrates significant improvements in a wide array of reasoning benchmarks, outperforming several other 7B models. Its strengths include:
- Mathematics: Achieves 17.50 on AIME24 (avg@32) and 77.25 on MATH500, indicating strong mathematical problem-solving abilities.
- Code Generation: Scores 16.49 on LiveCodeBench (avg@4) and 82.62 on HumanEval (avg@4), showcasing proficiency in coding tasks.
- Science & Logic: Performs well on GPQA-diamond (40.78) and SuperGPQA (31.80) for scientific reasoning, and notably 39.40 on Zebra Puzzle (avg@4) for logical deduction.
- Cross-Domain Reasoning: The model's average score across all evaluated domains is 43.29, highlighting its balanced and robust reasoning performance.
Use Cases
This model is particularly well-suited for applications requiring:
- Complex Problem Solving: Ideal for tasks that demand deep analytical and logical reasoning.
- Automated Code Generation & Review: Its strong performance on coding benchmarks makes it valuable for development workflows.
- Scientific Research & Analysis: Capable of assisting with scientific inquiry and data interpretation.
- Educational Tools: Can be leveraged for tutoring systems or platforms that require step-by-step reasoning explanations.