LLM360/guru-7B

Warm
Public
7.6B
FP8
131072
License: cc-by-nc-4.0
Hugging Face
Overview

LLM360/guru-7B: Enhanced Reasoning Model

LLM360/guru-7B is a 7.6 billion parameter model built upon the Qwen2.5-7B base, distinguished by its specialized fine-tuning for advanced reasoning tasks. This model is a product of research detailed in the paper "Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective" (arXiv:2506.14965).

Key Capabilities & Performance

The Guru-7B model demonstrates significant improvements in a wide array of reasoning benchmarks, outperforming several other 7B models. Its strengths include:

  • Mathematics: Achieves 17.50 on AIME24 (avg@32) and 77.25 on MATH500, indicating strong mathematical problem-solving abilities.
  • Code Generation: Scores 16.49 on LiveCodeBench (avg@4) and 82.62 on HumanEval (avg@4), showcasing proficiency in coding tasks.
  • Science & Logic: Performs well on GPQA-diamond (40.78) and SuperGPQA (31.80) for scientific reasoning, and notably 39.40 on Zebra Puzzle (avg@4) for logical deduction.
  • Cross-Domain Reasoning: The model's average score across all evaluated domains is 43.29, highlighting its balanced and robust reasoning performance.

Use Cases

This model is particularly well-suited for applications requiring:

  • Complex Problem Solving: Ideal for tasks that demand deep analytical and logical reasoning.
  • Automated Code Generation & Review: Its strong performance on coding benchmarks makes it valuable for development workflows.
  • Scientific Research & Analysis: Capable of assisting with scientific inquiry and data interpretation.
  • Educational Tools: Can be leveraged for tutoring systems or platforms that require step-by-step reasoning explanations.