distil-labs/distil-qwen3-0.6b-voice-assistant-banking
Distil Labs' Distil-Qwen3-0.6B-Voice-Assistant-Banking is a 0.6 billion parameter Qwen3-based model fine-tuned for multi-turn intent classification and slot extraction in banking voice assistants. Utilizing knowledge distillation from a 120B teacher model, it achieves 90.9% tool call accuracy, surpassing its teacher, while maintaining ~40ms inference speed. This model is specifically optimized for real-time voice pipelines, enabling efficient and accurate handling of 14 distinct banking operations.
Loading preview...
Model Overview
Distil-Qwen3-0.6B-Voice-Assistant-Banking is a compact, 0.6 billion parameter model built on the Qwen3 architecture, specifically fine-tuned by Distil Labs for banking voice assistant applications. It excels at multi-turn intent classification and slot extraction, crucial for robust conversational AI in financial services.
Key Capabilities & Performance
- High Accuracy: Achieves an impressive 90.9% tool call accuracy, notably outperforming its 120B parameter teacher model (87.5%) and the base Qwen3-0.6B model (48.7%).
- Extreme Efficiency: Despite its small size, it delivers approximately 40ms inference time, making it suitable for real-time voice pipelines with total latencies under 400ms.
- Specialized Functionality: Designed to act as a function caller, parsing user utterances (including those with ASR errors) and conversation history to output structured tool calls for 14 specific banking operations.
- Knowledge Distillation: Trained using knowledge distillation from a much larger 120B teacher model, allowing it to retain high performance in a significantly smaller footprint.
Ideal Use Cases
- Real-time Banking Voice Assistants: Powers full ASR -> SLM -> TTS pipelines for immediate responses.
- Text-based Banking Chatbots: Provides structured intent routing for automated customer service.
- Edge Deployment: Suitable for on-device voice processing due to its small size and high efficiency.
- Multi-turn Tool Calling: Effective for any bounded intent taxonomy requiring accurate function calling based on conversational context.