ChatMusician-Base: Music-Integrated LLM
ChatMusician-Base is a 7 billion parameter LLaMA2-based model developed by m-a-p, uniquely designed to integrate intrinsic musical abilities within a large language model. Unlike typical LLMs, it processes music using ABC notation as a text-compatible representation, enabling it to understand and generate music without external multi-modal components.
Key Capabilities
- Music Generation: Composes well-structured, full-length music conditioned on text, chords, melodies, motifs, and musical forms.
- Music Understanding: Demonstrates strong performance on the MusicTheoryBench for college-level music understanding, surpassing LLaMA2 and GPT-3.5 in zero-shot settings.
- Language Preservation: Endowing musical abilities does not degrade general language capabilities; it even shows a slightly higher MMLU score.
- Pure Text Tokenization: Utilizes a pure text tokenizer for both language and music, treating music as a second language.
Training and Evaluation
The model was continually pre-trained on the MusicPile dataset, the first pretraining corpus for developing musical abilities in LLMs, and supervised fine-tuned on 1.1 million samples. Evaluation includes both music-specific benchmarks like MusicTheoryBench and general language benchmarks like MMLU.
Limitations
Currently, the model primarily supports strict format and close-ended instructions for music tasks and may exhibit hallucinations. Its in-context learning and chain-of-thought abilities are weak, and a large portion of its training data is in the style of Irish music.