Name: dongguanting/Qwen3-14B-ARPO-DeepSearch API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: dongguanting

Overview

The dongguanting/Qwen3-14B-ARPO-DeepSearch model is a 14 billion parameter Qwen3-based large language model, fine-tuned using Agentic Reinforced Policy Optimization (ARPO). Developed by Guanting Dong and collaborators, ARPO is a novel agentic reinforcement learning algorithm specifically designed for training multi-turn LLM-based agents. It addresses the challenge of balancing an LLM's intrinsic long-horizon reasoning capabilities with its proficiency in multi-turn tool interactions.

Key Capabilities & Innovations

Entropy-based Adaptive Rollout: ARPO incorporates an adaptive rollout mechanism that dynamically balances global trajectory sampling and step-level sampling. This promotes exploration at steps with high uncertainty, particularly after tool usage, by adapting to the increased entropy distribution of generated tokens observed after external tool interactions.
Advantage Attribution Estimation: The model integrates an advantage attribution estimation, allowing LLMs to internalize advantage differences in stepwise tool-use interactions, thereby improving decision-making in complex sequences.
Efficient Tool Usage: A significant highlight is ARPO's ability to achieve superior performance across 13 challenging benchmarks in computational reasoning, knowledge reasoning, and deep search domains, while using only half the tool-use budget required by existing methods.

Use Cases

This model is particularly well-suited for applications requiring:

Multi-turn agentic reasoning: Where LLMs need to interact with external tools over multiple steps to solve complex problems.
Computational and knowledge reasoning: Excelling in tasks that demand logical deduction and access to external knowledge.
Deep search applications: Where efficient and effective use of search tools is critical for task completion.

Overview

Overview

Key Capabilities & Innovations

Use Cases

Full Model Card (README)