Accio-Lab/Metis-8B-RL

VISIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 9, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Cold

Accio-Lab/Metis-8B-RL is an 8 billion parameter multimodal reasoning agent, based on Qwen3-VL-8B-Instruct and fine-tuned with Hierarchical Decoupled Policy Optimization (HDPO). This model excels at strategic tool use, significantly reducing blind tool invocation by learning when to use code execution, text search, and image search. It achieves state-of-the-art performance among open-source 8B agentic models across 13 benchmarks, particularly in perception, document understanding, and complex mathematical/logical reasoning tasks.

Loading preview...

Metis-8B-RL: A Strategic Multimodal Reasoning Agent

Metis-8B-RL, developed by Accio-Lab, is an 8 billion parameter multimodal model built upon Qwen3-VL-8B-Instruct. It is the final RL-trained checkpoint of the Metis framework, utilizing Hierarchical Decoupled Policy Optimization (HDPO) to cultivate meta-cognitive tool use.

Key Capabilities & Differentiators

  • Efficient Tool Use: Drastically reduces blind tool invocation (from 98% to 2%) by learning when to use external tools like code execution, text search, and image search, rather than just how.
  • State-of-the-Art Performance: Achieves leading accuracy across 13 benchmarks among open-source 8B agentic models, demonstrating strong capabilities in perception, document understanding, and complex mathematical/logical reasoning.
  • HDPO Training: Employs a novel HDPO method with dual rewards and decoupled advantage estimation, allowing the model to first prioritize correctness and then optimize for tool efficiency.

Ideal Use Cases

  • Complex Multimodal Reasoning: Suited for tasks requiring strategic integration of visual and textual information with external tools.
  • Agentic Applications: Excellent for building intelligent agents that need to make informed decisions about when to invoke specific functionalities.
  • Problem Solving: Particularly strong in mathematical and logical reasoning, making it valuable for applications requiring precise problem-solving.