Name: distillabs/tft-benchmark-s1-direct-Qwen3-1.7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: distillabs

Model Overview

This distillabs/tft-benchmark-s1-direct-Qwen3-1.7B model is a 2 billion parameter Qwen3 variant, specifically fine-tuned by Distil Labs for multi-turn tool calling. It is a component of the TFT (Training from Traces) Benchmark, which evaluates different methods for training Small Language Models (SLMs) from production traces.

Key Characteristics

Base Model: Qwen3-1.7B, a 2 billion parameter language model.
Training Method: Utilizes "Direct Training," where the model is fine-tuned directly on raw production traces without additional filtering, relabeling, or synthetic data generation.
Benchmark Scenario: Evaluated in the S1 Baseline scenario, which uses 327 clean Restaurants_1 traces, representing a high-quality data environment.
Performance: Achieved an LLM-as-a-judge score of 0.864 and a staged_tool_call score of 0.787 on the S1 Baseline, indicating strong performance on clean data.
Target Tools: Designed to interact with tools for restaurant search (FindRestaurants), reservation (ReserveRestaurant), and user responses (respond_to_user), based on the Schema-Guided Dialogue (SGD) dataset.

Use Case and Differentiation

This model is particularly suited for applications requiring multi-turn tool calling in environments with clean, uncorrupted training data. Its direct training approach makes it a baseline for comparison against more complex pipelines like the TFT Pipeline, which includes trace filtering and synthetic data generation. While it performs comparably to the TFT Pipeline on clean data (S1 Baseline), the TFT Pipeline shows significant advantages (12-26 percentage points) in scenarios with noisy labels, schema drift, or low data availability. Therefore, this model is ideal for use cases where the training data quality is consistently high, offering a straightforward and effective solution for structured dialogue and tool interaction.

Overview

Model Overview

Key Characteristics

Use Case and Differentiation

Full Model Card (README)