dongguanting/Qwen2.5-7B-ARPO
TEXT GENERATIONConcurrency Cost:1Model Size:7.6BQuant:FP8Ctx Length:32kPublished:Jul 24, 2025License:mitArchitecture:Transformer0.0K Open Weights Cold

dongguanting/Qwen2.5-7B-ARPO is a 7.6 billion parameter language model developed by dongguanting, based on the Qwen2.5 architecture. It implements Agentic Reinforced Policy Optimization (ARPO), a novel RL algorithm designed for training multi-turn LLM-based agents. This model excels at balancing long-horizon reasoning with multi-turn tool interactions, particularly in computational reasoning, knowledge reasoning, and deep search domains, demonstrating improved performance with reduced tool-use budgets.

Loading preview...