Name: Jarrodbarnes/Qwen3-4B-tau2-sft1 API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: Jarrodbarnes

Jarrodbarnes/Qwen3-4B-tau2-sft1: Tool-Use Fine-Tuned Model

This model is a 4 billion parameter supervised fine-tuned (SFT) checkpoint, built upon the Qwen/Qwen3-4B-Instruct-2507 base model. Its primary focus is on tool-use tasks, specifically within the context of the tau2-bench framework.

Key Characteristics & Training

Base Model: Qwen/Qwen3-4B-Instruct-2507.
Fine-tuning: Supervised fine-tuning (SFT) using the Slime tau2 training cookbook.
Training Data: Utilizes the Jarrodbarnes/tau2-sft-seed-v3 dataset, which consists of filtered, rejection-sampled trajectories.
Hyperparameters: Key settings include num_epoch=2, global_batch_size=16, and a learning rate of 1e-5 with cosine decay.

Performance on tau2-bench

The model was evaluated on the tau2-bench test split (100 tasks) using the pass@1 metric (any-success over 1 attempt):

Overall pass@1: 0.40
Domain-specific pass@1:
- Airline: 0.20 (20 tasks)
- Retail: 0.60 (40 tasks)
- Telecom: 0.30 (40 tasks)

Intended Use

This model is specifically intended for research and reproduction of tau2-bench tool-use training. It is not recommended for deployment without further safety evaluation. The evaluation results are subject to small variance due to the user simulator's stochastic nature.

Overview

Jarrodbarnes/Qwen3-4B-tau2-sft1: Tool-Use Fine-Tuned Model

Key Characteristics & Training

Performance on tau2-bench

Intended Use

Full Model Card (README)