zzwkk/MUA-RL-32B
TEXT GENERATIONConcurrency Cost:2Model Size:32BQuant:FP8Ctx Length:32kPublished:Aug 25, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

MUA-RL-32B is a 32 billion parameter multi-turn user-interacting agent reinforcement learning model developed by zzwkk, designed for agentic tool use. It integrates LLM-simulated users into its reinforcement learning loop, enabling autonomous learning for efficient communication and tool utilization in dynamic multi-turn conversations. The model features a 32K token context length and demonstrates competitive performance on multi-turn tool-using benchmarks, often outperforming or matching larger open-source models.

Loading preview...