FunAudioLLM's InspireMusic-Base is a 0.5 billion parameter unified framework for music, song, and audio generation, built upon an autoregressive transformer with a Qwen2.5 backbone and a super-resolution flow-matching model. It integrates audio tokenization for high-quality, long-form audio output, supporting tasks like text-to-music and music continuation. This model is specifically designed for crafting soundscapes and enhancing research through generative audio.
Loading preview...
InspireMusic-Base: Unified Music, Song, and Audio Generation
InspireMusic-Base is a 0.5 billion parameter model developed by FunAudioLLM, designed as a unified toolkit for generating music, songs, and general audio. It leverages an autoregressive transformer, specifically using a Qwen2.5 backbone, combined with a super-resolution flow-matching model to produce high-quality, long-form audio.
Key Capabilities
- Unified Framework: Integrates audio tokenization with an autoregressive transformer and flow-matching for comprehensive audio generation.
- High-Quality Audio: Focuses on generating music with high audio fidelity.
- Long-Form Generation: Capable of producing extended music pieces, with some models supporting several minutes of audio.
- Text and Audio Prompts: Supports controllable generation using both text descriptions and audio prompts.
- Diverse Tasks: Currently supports text-to-music and music continuation, with future plans for song and general audio generation.
- Hardware Efficiency: Can run in 'fast mode' with 12GB GPU memory, while 'normal mode' (with flow matching) recommends 24GB for optimal experience.
Good For
- Developers and researchers focused on music generation from text or audio prompts.
- Creating long-form musical compositions.
- Innovating soundscapes and enhancing audio research.
- Experimenting with a unified framework for various audio generation tasks.