dongguanting/Qwen2.5-3B-ARPO
TEXT GENERATIONConcurrency Cost:1Model Size:3.1BQuant:BF16Ctx Length:32kPublished:Jul 24, 2025License:mitArchitecture:Transformer0.0K Open Weights Loading

dongguanting/Qwen2.5-3B-ARPO is a 3.1 billion parameter Qwen2.5-based language model developed by Guanting Dong and others, fine-tuned using Agentic Reinforced Policy Optimization (ARPO). This model is specifically designed for training multi-turn LLM-based agents, excelling in complex reasoning tasks that involve external tool interactions. It features an entropy-based adaptive rollout mechanism to enhance exploration and efficiency in tool-use scenarios, demonstrating superior performance in computational, knowledge, and deep search domains.

Loading preview...