anyreach-ai/semantic-turn-taking

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Feb 5, 2026License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

The anyreach-ai/semantic-turn-taking model, developed by Shangeth Rajaa, is a fine-tuned Qwen2.5-0.5B-Instruct model (494M parameters) designed for predicting turn-taking actions in conversational AI. Unlike acoustic methods, it leverages the semantic content of conversations to determine when a voice agent should speak, listen, or continue. This model predicts one of four specific actions: start_speaking, continue_listening, start_listening, or continue_speaking, making it ideal for building highly responsive and natural voice AI agents.

Loading preview...

Semantic Turn-Taking Model Overview

The anyreach-ai/semantic-turn-taking model is a specialized language model, fine-tuned from Qwen2.5-0.5B-Instruct, designed to predict optimal turn-taking actions for voice AI agents in real-time conversations. Its core innovation lies in using the semantic content of the dialogue, rather than just acoustic cues like silence detection, to make these predictions.

Key Capabilities

  • Semantic-based Turn Prediction: Determines agent actions based on the meaning and flow of the conversation.
  • Four Action Classes: Predicts one of four distinct actions:
    • start_speaking: User has finished, agent should respond.
    • continue_listening: User is still speaking.
    • start_listening: User interrupted the agent, agent should stop talking.
    • continue_speaking: User provided a backchannel, agent should continue speaking.
  • Efficient Inference: Offers low latency on both GPU (26-34 ms) and CPU (128-191 ms for ONNX q8) for single examples.
  • Benchmarked Performance: Achieves up to 91.82% accuracy on binary (EOU vs Not-EOU) turn-taking prediction on the TEN dataset.
  • Flexible Deployment: Available in PyTorch (fp16/fp32) and ONNX (q8 quantized) formats.

Good For

  • Voice AI Agents: Enhancing the naturalness and responsiveness of conversational AI systems.
  • Real-time Interaction: Applications requiring precise, context-aware turn-taking decisions in live dialogue.
  • Dialogue Management: Integrating semantic understanding into the flow control of spoken interactions.