FunAudioLLM/InspireMusic-1.5B

Hugging Face
TEXT GENERATIONConcurrency Cost:1Model Size:1.5BQuant:BF16Ctx Length:32kTool Calling:SupportedPublished:Jan 8, 2025Architecture:Transformer0.0K Warm

FunAudioLLM/InspireMusic-1.5B is a 1.5 billion parameter autoregressive transformer model developed by FunAudioLLM, built on the Qwen2.5 backbone. It is designed for high-quality music, song, and audio generation, supporting tasks like text-to-music and music continuation. This model excels at generating long-form audio, integrating audio tokenization with a super-resolution flow-matching model for enhanced acoustic detail.

Loading preview...

InspireMusic-1.5B: Unified Music, Song, and Audio Generation

InspireMusic-1.5B is a 1.5 billion parameter model from FunAudioLLM, built upon the Qwen2.5 autoregressive transformer architecture. It provides a unified framework for generating high-quality music, songs, and audio, integrating advanced audio tokenization with a super-resolution flow-matching model to produce detailed and coherent long-form audio.

Key Capabilities

  • High-Quality Audio Generation: Focuses on producing music, song, and audio with superior acoustic quality.
  • Long-Form Music Generation: Specifically designed to support the creation of extended musical pieces, up to several minutes in length.
  • Text-to-Music: Generates music from textual prompts.
  • Music Continuation: Extends existing audio prompts with new musical content.
  • Unified Framework: Combines audio tokenizers, an autoregressive transformer, and a flow-matching model for comprehensive generative tasks.
  • Controllable Generation: Supports generation guided by both text and audio prompts.

Good For

  • Developers and researchers focused on generative audio applications.
  • Creating custom music tracks from text descriptions.
  • Extending or completing existing musical segments.
  • Experimenting with AI-driven soundscape innovation and euphony enhancement.