Name: MYZY-AI/Muyan-TTS-SFT API
Brand: Featherless.ai
Price: 10.00 USD
Availability: InStock
Author: MYZY-AI

Muyan-TTS-SFT: Trainable TTS for Podcasts

Muyan-TTS-SFT is a 3.2 billion parameter Text-to-Speech (TTS) model developed by MYZY-AI, optimized for podcast production within a budget-conscious framework. It leverages extensive pre-training on over 100,000 hours of podcast audio data to deliver high-quality voice generation.

Key Capabilities

Zero-Shot TTS Synthesis: Generates high-quality speech from text without prior speaker-specific training, using a reference audio.
Speaker Adaptation: Supports customization to individual voices with as little as "dozens of minutes" of target speech, enabling fine-tuning for specific speakers.
SFT Model for Specific Voices: The sft model type is trained on a specific voice (e.g., Claire's voice in the examples) for consistent output, while the base model allows for arbitrary speaker prompts.
API Support: Includes an API for easy integration and deployment, with vLLM acceleration enabled by default for efficient inference.

Good For

Podcast Production: Ideal for creating and customizing voices for podcast content.
Personalized Audio Content: Generating speech in a specific speaker's voice with minimal adaptation data.
Developers: Provides a trainable framework for building custom TTS solutions, with clear installation and quickstart guides.

Overview

Muyan-TTS-SFT: Trainable TTS for Podcasts

Key Capabilities

Good For

Full Model Card (README)