ViiTor-Voice: A Lightweight, Real-time LLM-based TTS Engine
ViiTor-Voice by ZzWater is a 0.5 billion parameter Text-to-Speech (TTS) model engineered for efficiency and low-latency performance. It supports both Chinese and English languages and offers advanced features like zero-shot voice cloning, allowing for rapid voice replication from minimal samples. The model's design prioritizes computational resource optimization, making it deployable across various environments, including mobile and edge devices.
Key Capabilities
- Lightweight Design: With only 0.5B parameters, it's highly efficient and compatible with most LLM inference engines, suitable for diverse deployment scenarios.
- Real-time Streaming Output: Achieves an industry-leading first-frame latency of 200ms on Tesla T4, providing instant feedback for interactive applications.
- Rich Voice Library: Provides over 300 distinct voice options to match various content requirements and preferences.
- Flexible Speech Rate Adjustment: Allows natural variations in speech rate for enhanced emotional depth or efficient information delivery.
- Zero-shot Voice Cloning: Supports cloning based on minimal voice samples, enhancing personalization.
Good For
- Applications requiring low-latency, real-time speech generation.
- Deployments on resource-constrained devices like mobile phones or edge computing environments.
- Projects needing diverse voice options and flexible speech rate control.
- Use cases benefiting from zero-shot voice cloning for personalized audio output.