dongguanting/Qwen3-8B-AEPO-DeepSearch
TEXT GENERATIONConcurrency Cost:1Model Size:8BQuant:FP8Ctx Length:32kLicense:mitArchitecture:Transformer0.0K Open Weights Cold

The dongguanting/Qwen3-8B-AEPO-DeepSearch model is an 8 billion parameter Qwen3-based language model developed by Guanting Dong and collaborators, implementing the Agentic Entropy-Balanced Policy Optimization (AEPO) algorithm. Optimized for multi-turn, long-horizon tool-use capabilities in web agents, it features a 32768 token context length. This model excels at balancing entropy during agentic reinforcement learning to prevent training collapse and improve rollout sampling diversity, making it suitable for complex agentic tasks.

Loading preview...