Name: zzwkk/MUA-RL-8B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: zzwkk

MUA-RL-8B: Multi-Turn Agentic Tool Use Model

The zzwkk/MUA-RL-8B is an 8 billion parameter model specifically engineered for multi-turn user-interacting agent reinforcement learning (RL). Its core innovation lies in its ability to manage complex, multi-turn conversations while proficiently using external tools to achieve user goals. This model is distinguished by being the first framework to incorporate LLM-simulated users directly into its RL training loop, allowing it to autonomously learn efficient communication strategies and tool utilization.

Key Capabilities & Features

Multi-Turn Context Management: Designed to maintain conversational context over extended interactions.
Agentic Tool Use: Excels at integrating and utilizing various tools to solve practical problems.
Autonomous Learning: Leverages LLM-simulated users (specifically GPT-4o-2024-11-20) within its RL process, using Group Relative Policy Optimization (GRPO), to continuously improve its interaction and tool-use capabilities.
Competitive Performance: Despite its 8B parameter size, MUA-RL-8B shows competitive performance on multi-turn tool-using benchmarks (e.g., TAU2, BFCL-V3, ACEBench Agent) when compared to larger open-source models like DeepSeek-V3-0324 and Qwen3-32B in non-thinking settings.
32K Context Length: Supports a substantial context window for processing longer interactions.

Good For

Developing sophisticated conversational agents that require memory and tool-use capabilities.
Applications needing autonomous problem-solving in dynamic, interactive environments.
Research into reinforcement learning for agentic systems and user simulation in training.
Building agents that can handle complex, multi-step tasks requiring external information or actions.

Overview

MUA-RL-8B: Multi-Turn Agentic Tool Use Model

Key Capabilities & Features

Good For

Full Model Card (README)