pyon0024/tinyllama-katakana-converter
The pyon0024/tinyllama-katakana-converter is a 1.1 billion parameter model, fine-tuned from TinyLlama/TinyLlama-1.1B-Chat-v1.0, designed to generate rhythm-optimized Katakana and connected phoneme sequences. It specializes in capturing real-world auditory phenomena like liaison, reduction, and flapping in English speech. This model acts as a "Phonetic Bridge" for continuous speech, making it ideal for improving Text-to-Speech prosody and aiding Japanese learners with phonetic mapping.
Loading preview...
TinyLlama-1.1B-Phonetic-Liaison-Katakana-Generator Overview
This model, developed by pyon0024, is a specialized fine-tune of TinyLlama/TinyLlama-1.1B-Chat-v1.0 with 1.1 billion parameters. Its core function is to predict connected phoneme sequences (ARPAbet) and rhythm-optimized Katakana from English phrases. Unlike traditional Grapheme-to-Phoneme (G2P) converters that process words in isolation, this model focuses on how sounds change in continuous speech, handling phenomena like liaison, reduction, and flapping.
Key Capabilities
- Connected Phonemes (ARPAbet): Generates precise phonetic strings that account for word-to-word connections (e.g., "a little bit" becomes
AH0 L IH1 D AH0 L B IH1 T). - Liaison & Flapping: Accurately models sound changes such as 'T' to 'D' transformations and inter-word connections.
- Silent Letters: Intelligently identifies and ignores non-vocalized consonants.
- Supportive Katakana: Provides a phonetic map in Katakana that mimics native English rhythm, serving as a learning aid.
- High-Speed Inference: Optimized for mobile deployment, with strong compatibility for GGUF for on-device applications.
Good For
- TTS Frontend Development: Enhancing the prosody and naturalness of Text-to-Speech engines by providing linked phoneme outputs.
- ESL Tools: Visualizing phonetic changes in continuous speech for English language learners.
- Japanese Learners: Offering a "Phonetic Bridge" to understand English pronunciation and rhythm through Katakana, acting as "training wheels" for auditory learning.
Limitations
- Model Size: Due to its 1.1B parameter count, it may occasionally hallucinate on rare proper nouns.
- Accent: Primarily optimized for General American English (GenAm).