Jarrodbarnes/Qwen3-4B-tau2-sft1 is a 4 billion parameter supervised fine-tuned (SFT) model based on Qwen/Qwen3-4B-Instruct-2507, specifically optimized for tool-use tasks. It was trained using the Slime tau2 training cookbook on rejection-sampled trajectories from the Jarrodbarnes/tau2-sft-seed-v3 dataset. This model is designed for research and reproduction of tau2-bench tool-use training, demonstrating a 0.40 pass@1 score on the tau2-bench test split across airline, retail, and telecom domains.
Loading preview...
Jarrodbarnes/Qwen3-4B-tau2-sft1: Tool-Use Fine-Tuned Model
This model is a 4 billion parameter supervised fine-tuned (SFT) checkpoint, built upon the Qwen/Qwen3-4B-Instruct-2507 base model. Its primary focus is on tool-use tasks, specifically within the context of the tau2-bench framework.
Key Characteristics & Training
- Base Model: Qwen/Qwen3-4B-Instruct-2507.
- Fine-tuning: Supervised fine-tuning (SFT) using the Slime
tau2training cookbook. - Training Data: Utilizes the
Jarrodbarnes/tau2-sft-seed-v3dataset, which consists of filtered, rejection-sampled trajectories. - Hyperparameters: Key settings include
num_epoch=2,global_batch_size=16, and a learning rate of1e-5with cosine decay.
Performance on tau2-bench
The model was evaluated on the tau2-bench test split (100 tasks) using the pass@1 metric (any-success over 1 attempt):
- Overall pass@1: 0.40
- Domain-specific pass@1:
- Airline: 0.20 (20 tasks)
- Retail: 0.60 (40 tasks)
- Telecom: 0.30 (40 tasks)
Intended Use
This model is specifically intended for research and reproduction of tau2-bench tool-use training. It is not recommended for deployment without further safety evaluation. The evaluation results are subject to small variance due to the user simulator's stochastic nature.