Text2midi at AAAI
I’m thrilled to introduce text2midi, an end-to-end trained AI model designed to bridge the gap between textual descriptions and MIDI file generation! Our paper has been accepted in the Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI), and will be presented in Philadelphia the coming month.
Our approach leverages the power of large language models (LLMs) to enable users to generate symbolic music intuitively through text prompts. Whether you’re describing chords, keys, tempo, or just the vibe you want, text2midi transforms your words into music.
Key highights
- Large Multimodal Model: Combines a pretrained LLM encoder for text processing with an autoregressive transformer decoder for MIDI generation. The final model has 272M parameters.
- Text-to-music precision: Empirically / objectively validated high-quality and controllable music outputs. >> Listen to examples here.
- Open Source: we encourage you to use Text2midi as base model and improve upon it!
Paper: https://www.arxiv.org/abs/2412.16526
Authors: Keshav Bhandari, Abhinaba Roy, Ph.D., Kyra Wang, Geeta Puri, Simon Colton, Dorien Herremans
This project was enabled by the MidiCaps dataset, which we released earlier this year and presented at ISMIR Conference. Trained on MidiCaps and pretrained on SymphonyNet, we hope that the new large Text2midi model can help provide a stepping-stone for much needed advancements in the field of AI and midi.
We can’t wait to see how this model inspires musicians, creators, developers, and MIR researchers worldwide. Dive in, test it out, and let us know your feedback!