rishuXori/gemma-3-1b-FT
rishuXori/gemma-3-1b-FT is a 1 billion parameter Gemma 3 model fine-tuned by rishuXori. This LLM is specialized in intelligently detecting the completion of user speech from Speech-to-Text (STT) output, even amidst real-world noise. It supports English, Hinglish, and Hindi, acting as a critical 'turn detector' for seamless voice bot interactions.
Loading preview...
Overview
rishuXori/gemma-3-1b-FT is a specialized 1 billion parameter Large Language Model (LLM) fine-tuned from Google's Gemma 3 1B-IT. Its core function is to intelligently detect when a user's speech is complete after being processed by a Speech-to-Text (STT) system. This model is designed to handle the 'noisy' text often produced by real-time speech, making it robust for real-world conversational AI applications.
Key Capabilities
- Intelligent Turn Detection: Precisely analyzes STT output to predict the completion of a user's message, even with real-world speech nuances.
- Multilingual Support: Processes conversations in English, Hinglish, and Hindi (Devanagari script).
- Seamless Voice Bot Integration: Acts as a crucial 'turn detector' component, positioned between STT and Text-to-Speech (TTS) models to facilitate natural conversational flow.
How it Works
The model identifies an <end_of_turn> token or similar semantic cues within the incoming message. Its fine-tuning enables it to generate only a single token as output (by setting max_tokens=1 during inference), providing a swift and decisive prediction of whether the user's turn has ended. This mechanism significantly enhances user experience by reducing accidental interruptions and ensuring the bot responds at the appropriate moment.