rishuXori/gemma-3-1b-FT

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1BQuant:BF16Ctx Length:32kPublished:Jun 10, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm

rishuXori/gemma-3-1b-FT is a 1 billion parameter Gemma 3 model fine-tuned by rishuXori. This LLM is specialized in intelligently detecting the completion of user speech from Speech-to-Text (STT) output, even amidst real-world noise. It supports English, Hinglish, and Hindi, acting as a critical 'turn detector' for seamless voice bot interactions.

Loading preview...

Overview

rishuXori/gemma-3-1b-FT is a specialized 1 billion parameter Large Language Model (LLM) fine-tuned from Google's Gemma 3 1B-IT. Its core function is to intelligently detect when a user's speech is complete after being processed by a Speech-to-Text (STT) system. This model is designed to handle the 'noisy' text often produced by real-time speech, making it robust for real-world conversational AI applications.

Key Capabilities

  • Intelligent Turn Detection: Precisely analyzes STT output to predict the completion of a user's message, even with real-world speech nuances.
  • Multilingual Support: Processes conversations in English, Hinglish, and Hindi (Devanagari script).
  • Seamless Voice Bot Integration: Acts as a crucial 'turn detector' component, positioned between STT and Text-to-Speech (TTS) models to facilitate natural conversational flow.

How it Works

The model identifies an <end_of_turn> token or similar semantic cues within the incoming message. Its fine-tuning enables it to generate only a single token as output (by setting max_tokens=1 during inference), providing a swift and decisive prediction of whether the user's turn has ended. This mechanism significantly enhances user experience by reducing accidental interruptions and ensuring the bot responds at the appropriate moment.