Turn Detector Qwen3-4B: Real-time Turn-End Detection

This model is a fine-tuned Qwen3-4B language model, specifically optimized for real-time turn-end detection in multilingual call center conversations. Its primary function is to predict the probability that a speaker has finished their turn (P(<|im_end|>)), enabling low-latency voice agent pipelines (e.g., LiveKit) to determine the appropriate moment to respond.

Key Capabilities

Real-time Turn Detection: Outputs a probability score for turn completion, with P(im_end) > 0.5 indicating a complete turn.
Multilingual Support: Evaluated across 12 language pairs, demonstrating robust performance in diverse linguistic contexts.
High Precision: Achieves 100% precision in identifying turn completions on its evaluation dataset, minimizing false positives.
Optimized for Voice Agents: Designed to integrate into voice agent systems where timely and accurate turn-taking is crucial.

Performance Highlights

On a synthetic test set of 238 samples (119 positive, 119 negative) across 12 language pairs, the model achieved:

Accuracy: 88.24%
Precision: 100.00%
Recall: 76.47%
F1 Score: 86.67%

Notably, it showed 100% accuracy for identifying negative cases (speaker still talking) and strong performance across various language pairs, including Chinese-Tamil and Malay-English.

Training Details

The model was trained on positive samples (complete conversations ending with <|im_end|>) using a base Qwen/Qwen3-4B model. Training utilized Liger Fused Linear Cross Entropy loss, FA4 attention, and bfloat16 precision, with a block size of 8192 and a constant learning rate of 2e-5 over 1 epoch. Training data included datasets like Call Center Language Switching and Malaysian Multiturn Chat Assistant.

Overview

Turn Detector Qwen3-4B: Real-time Turn-End Detection

Key Capabilities

Performance Highlights

Training Details

Full Model Card (README)