Name: distillabs/tft-benchmark-s2-direct-Qwen3-1.7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: distillabs

Model Overview

This model, tft-benchmark-s2-direct-Qwen3-1.7B, is a 1.7 billion parameter Qwen3 base model fine-tuned by distillabs for multi-turn tool calling. It is a component of the TFT (Training from Traces) Benchmark, which evaluates different approaches to training Small Language Models (SLMs) from production traces.

Key Characteristics

Base Model: Qwen3-1.7B
Training Pipeline: Direct Training, meaning it was fine-tuned directly on raw/corrupted traces without filtering, relabeling, or synthetic data generation.
Scenario: Specifically trained for the S2 Noisy Labels scenario, which involves 327 Restaurants_1 traces with 50% corrupted assistant tool calls, focusing on tool timing errors.
Performance: Achieved an LLM-as-a-judge score of 0.721 and a staged_tool_call score of 0.731 in its specific benchmark scenario.
Target Tools: Designed to handle respond_to_user, FindRestaurants, and ReserveRestaurant tools, based on the Schema-Guided Dialogue (SGD) dataset.

When to Consider This Model

This model is primarily a benchmark artifact, demonstrating the performance of direct training on noisy data for tool-calling tasks. It serves as a baseline for comparison against more sophisticated training pipelines like the TFT Pipeline, which significantly outperforms direct training in corrupted scenarios (e.g., +12.3 percentage points in S2 Noisy Labels). Developers interested in understanding the challenges of training SLMs on noisy production data for tool use, or evaluating alternative training methodologies, will find this model and its associated benchmark valuable.

Overview

Model Overview

Key Characteristics

When to Consider This Model

Full Model Card (README)