FunAudioLLM/InspireMusic-Base
Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:0.5BQuant:BF16Ctx Length:32kPublished:Dec 2, 2024Architecture:Transformer0.0K Warm

FunAudioLLM's InspireMusic-Base is a 0.5 billion parameter unified framework for music, song, and audio generation, built upon an autoregressive transformer with a Qwen2.5 backbone and a super-resolution flow-matching model. It integrates audio tokenization for high-quality, long-form audio output, supporting tasks like text-to-music and music continuation. This model is specifically designed for crafting soundscapes and enhancing research through generative audio.

Loading preview...

InspireMusic-Base: Unified Music, Song, and Audio Generation

InspireMusic-Base is a 0.5 billion parameter model developed by FunAudioLLM, designed as a unified toolkit for generating music, songs, and general audio. It leverages an autoregressive transformer, specifically using a Qwen2.5 backbone, combined with a super-resolution flow-matching model to produce high-quality, long-form audio.

Key Capabilities

  • Unified Framework: Integrates audio tokenization with an autoregressive transformer and flow-matching for comprehensive audio generation.
  • High-Quality Audio: Focuses on generating music with high audio fidelity.
  • Long-Form Generation: Capable of producing extended music pieces, with some models supporting several minutes of audio.
  • Text and Audio Prompts: Supports controllable generation using both text descriptions and audio prompts.
  • Diverse Tasks: Currently supports text-to-music and music continuation, with future plans for song and general audio generation.
  • Hardware Efficiency: Can run in 'fast mode' with 12GB GPU memory, while 'normal mode' (with flow matching) recommends 24GB for optimal experience.

Good For

  • Developers and researchers focused on music generation from text or audio prompts.
  • Creating long-form musical compositions.
  • Innovating soundscapes and enhancing audio research.
  • Experimenting with a unified framework for various audio generation tasks.