m-a-p/ChatMusician-Base

TEXT GENERATIONConcurrency Cost:1Model Size:7BQuant:FP8Ctx Length:4kPublished:Feb 27, 2024License:mitArchitecture:Transformer0.0K Open Weights Cold

ChatMusician-Base is a 7 billion parameter LLaMA2-based large language model developed by m-a-p, specifically designed for intrinsic musical understanding and generation. It integrates musical abilities by continual pre-training and fine-tuning on text-compatible ABC notation, treating music as a second language. This model excels at composing structured, full-length music conditioned on various inputs like text, chords, melodies, and musical forms, without requiring external multi-modal neural structures.

Loading preview...

ChatMusician-Base: Music-Integrated LLM

ChatMusician-Base is a 7 billion parameter LLaMA2-based model developed by m-a-p, uniquely designed to integrate intrinsic musical abilities within a large language model. Unlike typical LLMs, it processes music using ABC notation as a text-compatible representation, enabling it to understand and generate music without external multi-modal components.

Key Capabilities

  • Music Generation: Composes well-structured, full-length music conditioned on text, chords, melodies, motifs, and musical forms.
  • Music Understanding: Demonstrates strong performance on the MusicTheoryBench for college-level music understanding, surpassing LLaMA2 and GPT-3.5 in zero-shot settings.
  • Language Preservation: Endowing musical abilities does not degrade general language capabilities; it even shows a slightly higher MMLU score.
  • Pure Text Tokenization: Utilizes a pure text tokenizer for both language and music, treating music as a second language.

Training and Evaluation

The model was continually pre-trained on the MusicPile dataset, the first pretraining corpus for developing musical abilities in LLMs, and supervised fine-tuned on 1.1 million samples. Evaluation includes both music-specific benchmarks like MusicTheoryBench and general language benchmarks like MMLU.

Limitations

Currently, the model primarily supports strict format and close-ended instructions for music tasks and may exhibit hallucinations. Its in-context learning and chain-of-thought abilities are weak, and a large portion of its training data is in the style of Irish music.