veggiebird/MATPO-single-agent-14b

TEXT GENERATIONConcurrency Cost:1Model Size:14BQuant:FP8Ctx Length:32kLicense:apache-2.0Architecture:Transformer0.0K Open Weights Cold

veggiebird/MATPO-single-agent-14b is a 14 billion parameter language model developed by veggiebird, based on the Qwen3 architecture. This model is specifically fine-tuned using the MATPO (Multi-Agent Tool-Integrated Policy Optimization) framework, enabling a single LLM to train and operate as both a planner and worker agent. It excels in multi-turn, tool-integrated planning tasks by mitigating context length bottlenecks and noisy tool responses, achieving an 18.38% relative improvement over single-agent baselines on GAIA-text, FRAMES, and WebWalker-QA benchmarks.

Loading preview...

What the fuck is this model about?

veggiebird/MATPO-single-agent-14b is a 14 billion parameter model based on the Qwen3 architecture, fine-tuned with the MATPO (Multi-Agent Tool-Integrated Policy Optimization) framework. Unlike traditional multi-agent systems that deploy separate models, MATPO trains multiple specialized agent roles (planner and worker) within a single LLM using reinforcement learning. This approach addresses critical limitations of single-agent models in multi-turn, tool-integrated planning, such as context length bottlenecks and interference from noisy tool responses.

What makes THIS different from all the other models?

This model's primary differentiator is its "multi-agent-in-one-model" architecture. It allows a single LLM to act as both a high-level planner and specialized worker agents, each with isolated contexts, trained via role-specific prompts and a principled credit assignment mechanism. This results in:

  • Improved Performance: Achieves an 18.38% relative improvement over single-agent GRPO baselines on GAIA-text, WebWalkerQA, and FRAMES datasets.
  • Enhanced Stability: Exhibits more stable learning curves and robustness to noisy tool responses compared to single-agent training.
  • Infrastructure Efficiency: Eliminates the need for deploying separate models or additional rollout engines, simplifying maintenance and reducing complexity.
  • Model Agnostic: While trained on Qwen3-14B-base, the MATPO framework is compatible with any decoder-only LLM supporting tool calling and multi-turn conversations.

Should I use this for my use case?

This model is particularly well-suited for:

  • Complex Multi-Turn Tasks: Ideal for scenarios requiring hierarchical planning and execution, such as advanced web search, data analysis, or scientific reasoning.
  • Tool-Integrated Applications: When your application heavily relies on external tools (e.g., web scraping, calculators, code execution) and needs to manage their outputs effectively.
  • Resource-Constrained Environments: If you need multi-agent capabilities but want to avoid the overhead of deploying and managing multiple separate LLMs.
  • Research in RL for LLMs: Provides a robust framework for exploring reinforcement learning in multi-agent contexts within a single model.

Consider this model if your application struggles with context overflow, noisy tool interactions, or requires a more structured, hierarchical approach to problem-solving within a single LLM.