zzwkk/MUA-RL-8B
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kPublished:Aug 25, 2025License:apache-2.0Architecture:Transformer Open Weights Cold

The zzwkk/MUA-RL-8B is an 8 billion parameter multi-turn user-interacting agent reinforcement learning model, developed by zzwkk, designed for agentic tool use. It specializes in maintaining context across multi-turn conversations and effectively utilizing tools to complete complex tasks. This model integrates LLM-simulated users into its reinforcement learning loop, enabling autonomous learning for efficient user communication and problem-solving in dynamic interactions. It demonstrates competitive performance against larger open-source models in multi-turn tool-using benchmarks.

Loading preview...