doodod/Turn-Detector-Qwen3-0.6B
The doodod/Turn-Detector-Qwen3-0.6B is a 0.8 billion parameter Transformer-based language model, instruction fine-tuned from Qwen3-0.6B. Developed by doodod, this model is specifically designed for semantic-level turn detection in voice chat pipelines, predicting the probability of an end-of-turn token to prevent premature interruptions caused by VAD pauses. It excels in both Chinese and English dialogue scenarios, addressing the challenge of accurately determining user input completion.
Loading preview...
Turn-Detector-Qwen3-0.6B: Semantic End-of-Turn Detection
The doodod/Turn-Detector-Qwen3-0.6B is a specialized 0.8 billion parameter language model, fine-tuned from Qwen3-0.6B, designed to enhance voice chat pipelines by providing semantic-level turn detection. Unlike traditional Voice Activity Detection (VAD), which can prematurely end a user's turn during natural pauses, this model analyzes transcribed text to accurately determine if a user's input has semantically concluded.
Key Capabilities
- Semantic Turn Recognition: Predicts the probability of the
<|im_end|>token, indicating the completion of a user's utterance at a semantic level. - Improved Voice Chat Flow: Reduces inaccurate interruptions in voice dialogue systems, especially when users pause while thinking.
- Small Parameter Footprint: Utilizes a 0.8B parameter Transformer architecture, making it efficient for deployment.
- Multilingual Support: Optimized for both Chinese and English dialogue scenarios.
- Robust Training Data: Trained on public datasets like Alpaca, MagicData, and ShareChatX, with specific optimizations for ASR-transcribed text characteristics (e.g., handling missing punctuation, filler words).
Good For
- Developers building voice assistants, chatbots, or conversational AI systems that require precise end-of-turn detection.
- Integrating into existing Voice Chat Pipelines to complement or replace VAD for more natural interaction.
- Applications where accurate understanding of user intent during pauses is critical for a smooth user experience.