MYZY-AI/Muyan-TTS-SFT
TEXT GENERATIONConcurrency Cost:1Model Size:3.2BQuant:BF16Ctx Length:32kPublished:Apr 22, 2025License:apache-2.0Architecture:Transformer0.0K Open Weights Warm
MYZY-AI's Muyan-TTS-SFT is a 3.2 billion parameter trainable Text-to-Speech (TTS) model specifically designed for podcast applications. Pre-trained on over 100,000 hours of podcast audio, it offers high-quality zero-shot TTS synthesis and supports speaker adaptation with minimal target speech. This model excels at generating customizable voices, making it suitable for personalized audio content creation.
Loading preview...
Muyan-TTS-SFT: Trainable TTS for Podcasts
Muyan-TTS-SFT is a 3.2 billion parameter Text-to-Speech (TTS) model developed by MYZY-AI, optimized for podcast production within a budget-conscious framework. It leverages extensive pre-training on over 100,000 hours of podcast audio data to deliver high-quality voice generation.
Key Capabilities
- Zero-Shot TTS Synthesis: Generates high-quality speech from text without prior speaker-specific training, using a reference audio.
- Speaker Adaptation: Supports customization to individual voices with as little as "dozens of minutes" of target speech, enabling fine-tuning for specific speakers.
- SFT Model for Specific Voices: The
sftmodel type is trained on a specific voice (e.g., Claire's voice in the examples) for consistent output, while thebasemodel allows for arbitrary speaker prompts. - API Support: Includes an API for easy integration and deployment, with vLLM acceleration enabled by default for efficient inference.
Good For
- Podcast Production: Ideal for creating and customizing voices for podcast content.
- Personalized Audio Content: Generating speech in a specific speaker's voice with minimal adaptation data.
- Developers: Provides a trainable framework for building custom TTS solutions, with clear installation and quickstart guides.