Overview

This model, tft-benchmark-s4-tft-Qwen3-1.7B, is a 1.7 billion parameter Qwen3 variant developed by Distil Labs. It is specifically fine-tuned for multi-turn tool calling within the context of the TFT (Training from Traces) Benchmark. The benchmark evaluates two distinct approaches for training SLMs from production traces: the TFT Pipeline and Direct Training.

Key Capabilities & Performance

Multi-turn Tool Calling: Optimized for complex conversational interactions requiring tool use, such as restaurant search and reservation based on the Schema-Guided Dialogue (SGD) dataset.
Robustness in Low-Data Scenarios: This specific model was trained under the 'S4 Low Data' scenario, utilizing only 5 clean traces. Despite extreme data scarcity, it achieved an LLM-as-a-judge score of 0.852 and a staged_tool_call score of 0.74.
TFT Pipeline Advantage: The TFT pipeline, which involves trace filtering, committee relabeling, and synthetic data generation, significantly outperforms Direct Training in corrupted or scarce data scenarios, showing a +20.3 percentage point improvement over Direct Training in the S4 Low Data scenario.

Training Methodology

The model was trained using the TFT pipeline, where production traces are filtered, relabeled by a committee of LLMs (openai.gpt-oss-120b + zai.glm-5), and then used to seed synthetic data generation. The student model is subsequently fine-tuned on this synthetic dataset using LoRA. The teacher/synthetic generation model used was zai.glm-5, and the judge model was openai.gpt-oss-120b.

Target Tools

It supports tools for restaurant search (FindRestaurants) and reservation (ReserveRestaurant), along with a general respond_to_user function.

When to Use This Model

This model is particularly well-suited for applications requiring reliable multi-turn tool calling, especially when working with limited or noisy production trace data for fine-tuning. Its strong performance in low-data conditions makes it valuable for developing robust conversational AI agents where data collection is challenging.

Overview

Overview

Key Capabilities & Performance

Training Methodology

Target Tools

When to Use This Model

Full Model Card (README)