LLM360/guru-7B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:May 14, 2025License:cc-by-nc-4.0Architecture:Transformer0.0K Open Weights Warm

LLM360/guru-7B is a 7.6 billion parameter language model based on the Qwen2.5-7B architecture, specifically fine-tuned for enhanced reasoning capabilities across various domains. It excels in mathematical problem-solving, code generation, scientific inquiry, and logical tasks, demonstrating superior performance compared to other 7B models in these areas. This model is particularly optimized for complex reasoning challenges, making it suitable for applications requiring robust analytical and problem-solving skills.

Loading preview...

LLM360/guru-7B: Enhanced Reasoning Model

LLM360/guru-7B is a 7.6 billion parameter model built upon the Qwen2.5-7B base, distinguished by its specialized fine-tuning for advanced reasoning tasks. This model is a product of research detailed in the paper "Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective" (arXiv:2506.14965).

Key Capabilities & Performance

The Guru-7B model demonstrates significant improvements in a wide array of reasoning benchmarks, outperforming several other 7B models. Its strengths include:

  • Mathematics: Achieves 17.50 on AIME24 (avg@32) and 77.25 on MATH500, indicating strong mathematical problem-solving abilities.
  • Code Generation: Scores 16.49 on LiveCodeBench (avg@4) and 82.62 on HumanEval (avg@4), showcasing proficiency in coding tasks.
  • Science & Logic: Performs well on GPQA-diamond (40.78) and SuperGPQA (31.80) for scientific reasoning, and notably 39.40 on Zebra Puzzle (avg@4) for logical deduction.
  • Cross-Domain Reasoning: The model's average score across all evaluated domains is 43.29, highlighting its balanced and robust reasoning performance.

Use Cases

This model is particularly well-suited for applications requiring:

  • Complex Problem Solving: Ideal for tasks that demand deep analytical and logical reasoning.
  • Automated Code Generation & Review: Its strong performance on coding benchmarks makes it valuable for development workflows.
  • Scientific Research & Analysis: Capable of assisting with scientific inquiry and data interpretation.
  • Educational Tools: Can be leveraged for tutoring systems or platforms that require step-by-step reasoning explanations.