FunAudioLLM/InspireMusic-1.5B-24kHz

Warm
Public
1.5B
BF16
131072
Hugging Face
Overview

InspireMusic-1.5B-24kHz: High-Fidelity Music Generation

InspireMusic-1.5B-24kHz is a 1.5 billion parameter model from FunAudioLLM, part of a unified framework for music, song, and audio generation. It leverages an autoregressive transformer, specifically based on the Qwen2.5 architecture, combined with audio tokenizers and a super-resolution flow-matching model. This architecture enables the creation of high-quality, long-form audio.

Key Capabilities

  • Text-to-Music Generation: Generates music from textual prompts.
  • Music Continuation: Extends existing audio prompts to create longer musical pieces.
  • High Audio Quality: Utilizes a super-resolution flow-matching model to enhance acoustic details and fidelity.
  • Long-Form Generation: Supports the creation of music lasting several minutes, addressing a common challenge in audio synthesis.
  • Unified Framework: Integrates audio tokenization, autoregressive transformers, and flow-matching for comprehensive audio generation tasks.

Good For

  • Developers and researchers focused on generative AI for music.
  • Applications requiring high-quality, text-prompted music creation.
  • Use cases demanding the continuation or extension of musical segments.
  • Projects needing to generate extended musical compositions with a 24kHz mono output.