doodod/Turn-Detector-Qwen3-0.6B

Warm
Public
0.8B
BF16
40960
Aug 4, 2025
License: mit
Hugging Face
Overview

Turn-Detector-Qwen3-0.6B: Semantic End-of-Turn Detection

The doodod/Turn-Detector-Qwen3-0.6B is a specialized 0.8 billion parameter language model, fine-tuned from Qwen3-0.6B, designed to enhance voice chat pipelines by providing semantic-level turn detection. Unlike traditional Voice Activity Detection (VAD), which can prematurely end a user's turn during natural pauses, this model analyzes transcribed text to accurately determine if a user's input has semantically concluded.

Key Capabilities

  • Semantic Turn Recognition: Predicts the probability of the <|im_end|> token, indicating the completion of a user's utterance at a semantic level.
  • Improved Voice Chat Flow: Reduces inaccurate interruptions in voice dialogue systems, especially when users pause while thinking.
  • Small Parameter Footprint: Utilizes a 0.8B parameter Transformer architecture, making it efficient for deployment.
  • Multilingual Support: Optimized for both Chinese and English dialogue scenarios.
  • Robust Training Data: Trained on public datasets like Alpaca, MagicData, and ShareChatX, with specific optimizations for ASR-transcribed text characteristics (e.g., handling missing punctuation, filler words).

Good For

  • Developers building voice assistants, chatbots, or conversational AI systems that require precise end-of-turn detection.
  • Integrating into existing Voice Chat Pipelines to complement or replace VAD for more natural interaction.
  • Applications where accurate understanding of user intent during pauses is critical for a smooth user experience.