distillabs/tft-benchmark-s1-tft-Qwen3-1.7B
The distillabs/tft-benchmark-s1-tft-Qwen3-1.7B is a 1.7 billion parameter Qwen3 model, fine-tuned by Distil Labs for multi-turn tool calling. It was developed as part of the TFT (Training from Traces) Benchmark, specifically for the S1 Baseline scenario using clean data. This model excels at accurately executing complex tool-use instructions within conversational AI systems, demonstrating strong performance in restaurant search and reservation tasks. Its training methodology, involving trace filtering, committee relabeling, and synthetic data generation, optimizes it for robust tool-calling capabilities.
Loading preview...
Model Overview
The distillabs/tft-benchmark-s1-tft-Qwen3-1.7B is a 1.7 billion parameter Qwen3 model developed by Distil Labs. It has been fine-tuned for multi-turn tool calling, a critical capability for conversational AI agents. This model is a component of the TFT (Training from Traces) Benchmark, which evaluates different approaches to training Small Language Models (SLMs) from production traces.
Key Capabilities
- Multi-turn Tool Calling: Specialized in understanding and executing tool-use instructions across multiple conversational turns.
- TFT Pipeline Training: Utilizes a sophisticated training pipeline involving trace filtering, committee relabeling by multiple LLMs, and synthetic data generation to enhance performance.
- Benchmark Performance: Achieved an LLM-as-a-judge score of 0.866 and a
staged_tool_callscore of 0.765 in the S1 Baseline scenario of the TFT benchmark, which uses clean production traces. - Target Tools: Proficient in using tools for restaurant search (
FindRestaurants) and reservation (ReserveRestaurant), based on the Schema-Guided Dialogue (SGD) dataset.
Good For
- Developing Tool-Calling Agents: Ideal for applications requiring an SLM to accurately interpret and execute tool functions in multi-turn dialogues.
- Benchmarking Tool-Use Performance: Serves as a strong baseline model for evaluating tool-calling capabilities, particularly when trained with the TFT pipeline.
- Research into SLM Training: Useful for researchers exploring advanced fine-tuning techniques for SLMs using production traces and synthetic data generation.