Name: distillabs/tft-benchmark-s2-tft-Qwen3-1.7B API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: distillabs

Model Overview

The distillabs/tft-benchmark-s2-tft-Qwen3-1.7B is a 2 billion parameter Qwen3 model developed by Distil Labs. It is specifically fine-tuned for multi-turn tool calling within the context of the TFT (Training from Traces) Benchmark. This model addresses the challenge of training Small Language Models (SLMs) from production traces, particularly in scenarios with noisy or corrupted data.

Key Capabilities & Differentiators

Robust Tool Calling: Achieves an LLM-as-a-judge score of 0.844 and a staged_tool_call score of 0.758 in the S2 Noisy Labels scenario, where 50% of assistant tool calls are corrupted.
TFT Pipeline Advantage: Trained using the advanced TFT pipeline (trace filtering, committee relabeling, synthetic data generation, and fine-tuning), which significantly outperforms direct training on raw traces, especially with corrupted data. For instance, it shows a +12.3 percentage point improvement over direct training in the S2 Noisy Labels scenario.
Targeted Tool Use: Optimized for restaurant search and reservation tools, including respond_to_user, FindRestaurants, and ReserveRestaurant, based on the Schema-Guided Dialogue (SGD) dataset.

When to Use This Model

This model is ideal for applications requiring reliable multi-turn tool calling, particularly in environments where training data may contain noise or corruption. Its strength lies in its ability to handle imperfect production traces, making it suitable for developing robust conversational AI agents that interact with external tools for tasks like restaurant booking or information retrieval.

Overview

Model Overview

Key Capabilities & Differentiators

When to Use This Model

Full Model Card (README)