distil-labs/distil-qwen3-0.6b-voice-assistant-banking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.8BQuant:BF16Ctx Length:32kPublished:Feb 13, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

Distil Labs' Distil-Qwen3-0.6B-Voice-Assistant-Banking is a 0.6 billion parameter Qwen3-based model fine-tuned for multi-turn intent classification and slot extraction in banking voice assistants. Utilizing knowledge distillation from a 120B teacher model, it achieves 90.9% tool call accuracy, surpassing its teacher, while maintaining ~40ms inference speed. This model is specifically optimized for real-time voice pipelines, enabling efficient and accurate handling of 14 distinct banking operations.

Loading preview...

Model Overview

Distil-Qwen3-0.6B-Voice-Assistant-Banking is a compact, 0.6 billion parameter model built on the Qwen3 architecture, specifically fine-tuned by Distil Labs for banking voice assistant applications. It excels at multi-turn intent classification and slot extraction, crucial for robust conversational AI in financial services.

Key Capabilities & Performance

  • High Accuracy: Achieves an impressive 90.9% tool call accuracy, notably outperforming its 120B parameter teacher model (87.5%) and the base Qwen3-0.6B model (48.7%).
  • Extreme Efficiency: Despite its small size, it delivers approximately 40ms inference time, making it suitable for real-time voice pipelines with total latencies under 400ms.
  • Specialized Functionality: Designed to act as a function caller, parsing user utterances (including those with ASR errors) and conversation history to output structured tool calls for 14 specific banking operations.
  • Knowledge Distillation: Trained using knowledge distillation from a much larger 120B teacher model, allowing it to retain high performance in a significantly smaller footprint.

Ideal Use Cases

  • Real-time Banking Voice Assistants: Powers full ASR -> SLM -> TTS pipelines for immediate responses.
  • Text-based Banking Chatbots: Provides structured intent routing for automated customer service.
  • Edge Deployment: Suitable for on-device voice processing due to its small size and high efficiency.
  • Multi-turn Tool Calling: Effective for any bounded intent taxonomy requiring accurate function calling based on conversational context.