dx2102/llama-midi: Text-to-MIDI Music Generation
This model, developed by dx2102, is a 1 billion parameter language model fine-tuned from Llama-3.2-1B. Its primary function is to generate MIDI music scores from text prompts.
Key Capabilities
- Text-to-MIDI Generation: Converts textual input into a structured text representation of a MIDI score.
- Prompt-driven Composition: Can use score titles or other text as prompts to guide music generation.
- Integration with
symusic: Provides Python code examples for converting the generated text representation into a .mid file and vice-versa using the symusic library.
How it Works
The model outputs a sequence of pitch duration wait velocity instrument tokens. These tokens define individual musical notes, including their pitch, duration, delay before the next note, velocity, and the instrument playing them. The provided postprocess function handles the conversion of this text format into a standard MIDI file, while the preprocess function can convert existing MIDI files into the model's input format.
Good For
- Developers and musicians looking to programmatically generate music scores.
- Experimenting with AI-driven music composition from textual descriptions.
- Integrating music generation capabilities into applications using a familiar language model interface.