OrionLLM/GRM-2.5

VISIONConcurrency Cost:1Model Size:4.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Apr 7, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

GRM-2.5 is a 4.5 billion parameter reasoning model developed by OrionLLM, built on the Qwen3.5 architecture. Optimized for structured reasoning and efficient local deployment, it excels in complex problem-solving, code generation, and agentic workflows. This model is designed for general-purpose local AI, offering strong performance across diverse tasks while remaining accessible on consumer hardware.

Loading preview...

GRM-2.5: A Compact Reasoning Model for Local AI

GRM-2.5 is a 4.5 billion parameter model from OrionLLM, leveraging the Qwen3.5 architecture. It is specifically optimized for structured reasoning and efficient local deployment, making it a powerful solution for general-purpose AI on consumer hardware.

Key Capabilities

  • Strong Reasoning: Handles both everyday conversations and complex reasoning tasks with clarity and consistency.
  • Efficient Local Coding and Agentic Use: Well-suited for code generation, structured problem-solving, and local agent-style workflows despite its compact size.
  • Optimized for Local Deployment: Designed for accessible inference across a broad range of hardware, prioritizing practical usability.

Performance Highlights

GRM-2.5 demonstrates strong performance in various benchmarks, including:

  • MMLU-Pro: Achieves 80.1, indicating robust knowledge and STEM capabilities.
  • IFEval: Scores 90.2 for instruction following.
  • LiveCodeBench v6: Reaches 56.9, showcasing its coding proficiency.
  • TAU2-Bench: Scores 80.2 for agentic tasks.

Good For

  • Developers seeking a capable AI model for local inference on consumer hardware.
  • Applications requiring strong structured reasoning and problem-solving abilities.
  • Use cases involving code generation and agent-style workflows.
  • Scenarios where a balance between performance and efficiency is crucial.