FunAudioLLM/InspireMusic-1.5B-Long

Warm
Public
1.5B
BF16
131072
Hugging Face
Overview

InspireMusic-1.5B-Long Overview

InspireMusic-1.5B-Long is a 1.5 billion parameter model developed by FunAudioLLM, specifically engineered for advanced music, song, and audio generation. It leverages a unified framework that combines audio tokenization with an autoregressive transformer, built upon the Qwen2.5 backbone, and a super-resolution flow-matching model. This architecture enables the creation of high-quality, long-form audio content, distinguishing it from models primarily focused on text or shorter audio segments.

Key Capabilities

  • Long-form Music Generation: Capable of generating coherent music pieces lasting several minutes.
  • High Audio Quality: Utilizes a super-resolution flow-matching model to enhance acoustic details and fidelity.
  • Unified Framework: Integrates audio tokenizers, an autoregressive transformer, and flow-matching for comprehensive audio generation.
  • Text-to-Music: Generates music from English text prompts.
  • Music Continuation: Extends existing audio prompts to create longer musical sequences.
  • Flexible Inference: Supports both a 'normal' mode with flow matching for higher quality and a 'fast' mode without for quicker generation, with varying GPU memory requirements (24GB for normal, 12GB for fast).

Good For

  • Developers and researchers focused on creating extended, high-fidelity musical compositions.
  • Applications requiring text-to-music synthesis or music continuation.
  • Experimenting with a unified framework for diverse audio generation tasks, including future support for song and general audio generation.