Name: distillabs/tft-benchmark-s4-direct-Qwen3-1.7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: distillabs

Model Overview

This model, tft-benchmark-s4-direct-Qwen3-1.7B, is a Qwen3-1.7B variant specifically fine-tuned for multi-turn tool calling. It is a component of the TFT (Training from Traces) Benchmark, which evaluates different approaches to training Small Language Models (SLMs) from production traces.

Key Characteristics

Base Model: Qwen3-1.7B, a 1.7 billion parameter model.
Training Pipeline: Utilizes a "Direct Training" approach, meaning it was fine-tuned directly on raw, expanded production traces without filtering, relabeling, or synthetic data generation.
Scenario: Trained under the "S4 Low Data" scenario, using only 5 clean Restaurants_1 traces, representing extreme data scarcity.
Performance: Achieved an LLM-as-a-judge score of 0.649 and a staged_tool_call score of 0.66 on a held-out test set of 34 multi-turn conversations.
Target Tools: Designed to interact with tools for restaurant search (FindRestaurants) and reservation (ReserveRestaurant), based on the Schema-Guided Dialogue (SGD) dataset.

When to Use This Model

This model is particularly relevant for researchers and developers interested in:

Benchmarking Direct Training: Understanding the performance of direct fine-tuning on raw, potentially noisy, and scarce production traces for tool-calling tasks.
Low-Data Scenarios: Evaluating model behavior and limitations when trained with very limited clean data.
Comparison with TFT Pipeline: Comparing its performance against models trained with the more sophisticated TFT pipeline (trace filtering, relabeling, synthetic data generation), which significantly outperforms direct training in corrupted scenarios (e.g., +20.3pp in S4 Low Data).

Overview

Model Overview

Key Characteristics

When to Use This Model

Full Model Card (README)