Name: Jarrodbarnes/Qwen3-4B-tau2-grpo-v1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jarrodbarnes

Model Overview

Jarrodbarnes/Qwen3-4B-tau2-grpo-v1 is a 4 billion parameter model built on the Qwen3-4B-Instruct base, specifically optimized for multi-turn tool-use tasks. It demonstrates a 4x improvement over the base model in agentic capabilities, achieving 59% Pass@4 on the challenging tau2-bench test split.

Key Capabilities & Training

This model's advanced performance stems from a progressive three-stage training pipeline:

SFT (Supervised Fine-Tuning): Initial learning of tool schemas and interaction protocols.
RFT (Rejection Fine-Tuning): Focusing on high-quality interaction trajectories.
GRPO (Group Relative Policy Optimization): Reinforcement learning with turn-level reward shaping for complex multi-step reasoning.

This methodology enables the model to effectively handle sequential function calls and complex agent workflows, as detailed in the tau2 training cookbook.

Performance Highlights

On the tau2-bench test split, the model achieves:

Overall Pass@4: 59.0%
Overall Pass@1: 36.0%

This significantly surpasses the baseline Qwen3-4B-Instruct, which scored 14.3% Pass@4, showcasing the effectiveness of the GRPO fine-tuning for agentic tasks.

Use Cases

This model is particularly well-suited for applications requiring:

Multi-turn function calling: Executing a sequence of tool interactions to complete a complex task.
Agentic workflows: Building AI agents that can reason and act over multiple steps.
Automated task completion: Handling structured interactions in domains like retail and airline services, though telecom tasks remain more challenging.

Limitations

Users should note challenges in the Telecom domain (40% Pass@4) and sensitivity to the user simulator used during evaluation. The reported Pass@k metric differs from the Pass^k used on the official tau2-bench leaderboard.

Overview

Model Overview

Key Capabilities & Training

Performance Highlights

Use Cases

Limitations

Full Model Card (README)