Can AI really compose band-quality music - with structure, harmony, and creative control?

That’s the question we set out to explore in our latest work, BandCondiNet, now accepted in Expert Systems with Applications!

Conditional music generation promises more user control, but current systems often struggle with three things:
- low-fidelity input conditions,
- weak structural coherence, and
- poor harmony across instruments.

BandCondiNet tackles these challenges head-on with a parallel Transformer-based architecture designed for multitrack music. It introduces:
- Multi-view features that capture richer musical context,
- Structure Enhanced Attention (SEA) for better musical form, and
- Cross-track Transformer (CTT) to model inter-track harmony.

Across two datasets, it outperforms other models in almost every metric — both in objective fidelity and listener preference tests.

If you’re curious to hear how it sounds, check out the links below:

- Read paper (or preprint: https://arxiv.org/abs/2407.10462)
- GitHub
- Audio examples

Big thanks to my co-authors (Jing Luo and Xinyu Yang from Xi'an Jiaotong University) for pushing this field forward!

#ESWA Elsevier #ismir #musicAI #genAI #generative #AI #music