Name: zzwkk/MUA-RL-32B API
Brand: Featherless.ai
Price: 25.00 USD
Availability: InStock
Author: zzwkk

MUA-RL-32B: Multi-Turn Agentic Tool Use Model

MUA-RL-32B is a 32 billion parameter model specifically engineered for agentic tool use within complex, multi-turn conversational scenarios. Developed by zzwkk, this model introduces a novel framework that integrates LLM-simulated users directly into its reinforcement learning (RL) loop, allowing it to autonomously learn how to communicate effectively with users and leverage various tools to solve problems.

Key Capabilities & Features

Multi-Turn Interaction: Designed to maintain context and effectively utilize tools across extended conversations.
Autonomous Learning: Employs Group Relative Policy Optimization (GRPO) with LLM-simulated users (e.g., GPT-4o) for self-improvement in tool-using tasks.
Agentic Tool Use: Seamlessly handles tool calling and response processing to complete complex tasks.
Competitive Performance: Achieves strong results on benchmarks like TAU2 Retail, TAU2 Airline, BFCL-V3 Multi Turn, and ACEBench Agent, often matching or exceeding the performance of larger open-source models such as DeepSeek-V3-0324 and Qwen3-235B-A22B in non-thinking settings.
32K Context Length: Supports extensive conversational history and complex task instructions.

Ideal Use Cases

Customer Service Agents: Automating complex, multi-step customer interactions requiring tool access.
Technical Support Bots: Resolving issues by interacting with various systems and maintaining conversation flow.
Interactive Problem Solving: Applications where an agent needs to dynamically use tools based on user input over time.

Overview

MUA-RL-32B: Multi-Turn Agentic Tool Use Model

Key Capabilities & Features

Ideal Use Cases

Full Model Card (README)