LMIS-ORG/ToolOrchestra_Slime_Agentic_Qwen3_8B

TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Apr 3, 2026Architecture:Transformer Cold

LMIS-ORG's ToolOrchestra_Slime_Agentic_Qwen3_8B is an 8 billion parameter Qwen3-based model designed for multi-agent task orchestration. It implements the Orchestrator-Expert framework, where a central LLM routes tasks to specialized expert models and tools through multi-turn tool calls. This model excels at improving tool-use and routing capabilities through GRPO-based training on the Orchestrator's decision trajectory, making it suitable for complex, multi-step problem-solving requiring dynamic tool and expert selection.

Loading preview...

ToolOrchestra_Slime_Agentic_Qwen3_8B Overview

This model, developed by LMIS-ORG, is an 8 billion parameter Qwen3-based agentic LLM that implements the ToolOrchestra framework. It operates as an Orchestrator-Expert multi-agent system, specifically designed for reinforcement learning (RL) training environments. The core idea involves a central Orchestrator LLM learning to dynamically route tasks to the most suitable specialized expert models and their corresponding tools.

Key Capabilities

  • Multi-Agent Orchestration: Manages interactions between a central Orchestrator and multiple expert LLMs.
  • Dynamic Tool-Use: The Orchestrator learns to make multi-turn tool calls, integrating external tools like FAISS retrieval services.
  • Expert Routing: Intelligently directs tasks to specialized expert models based on the problem at hand.
  • GRPO-based Improvement: Utilizes Guided Reinforcement Policy Optimization (GRPO) to enhance the Orchestrator's decision-making, tool-use, and routing abilities without requiring manual annotation of intermediate steps.
  • Enhanced Performance: Demonstrates significant improvement over the base Qwen3-8B model on the τ²-Bench dataset, achieving a score of 0.388 compared to the baseline's 0.278.

Good For

  • Complex Task Automation: Ideal for scenarios requiring sequential decision-making and dynamic resource allocation (tools/experts).
  • Agentic System Development: Provides a robust framework for building and training advanced AI agents.
  • Research in RL and LLM Agents: Offers a practical implementation of the ToolOrchestra concept for further research and experimentation in agentic AI.