Publications

Export 157 results:
Author Title [ Type(Asc)] Year
Conference Paper
Melechovsky J, Guo Z, Ghosal D, Majumder N, Herremans D, Poria S.  2024.  Mustango: Toward Controllable Text-to-Music Generation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pages 8293–8316. PDF icon 2311.08355 (1).pdf (11.38 MB)
Tan H.H., Herremans D..  2020.  Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling. ISMIR. PDF icon 2007.15474.pdf (2.67 MB)
Guo R, Simpton I., Kiefer C., Magnusson T, Herremans D..  2022.  MusIAC: An extensible generative framework for Music Infilling Application with multi-level Control. EvoMUSART. PDF icon 2202.05528.pdf (893.23 KB)
Herremans D., Chuan C.-H..  2017.  A multi-modal platform for semantic music analysis: visualizing audio- and score-based tension. 11th International Conference on Semantic Computing IEEE ICSC 2017. PDF icon paper_preprint.pdf (1.63 MB)
T. Phuong HThi, Herremans D., Roig G..  2019.  Multimodal Deep Models for Predicting Affective Responses Evoked by Movies. The 2nd International Workshop on Computer Vision for Physiological Measurement as part of ICCV. Seoul, South Korea. 2019. PDF icon 1909.06957.pdf (836.3 KB)
Herremans D., Chew E..  2016.  MorpheuS: Automatic music generation with recurrent pattern constraints and tension profiles. IEEE TENCON. PDF icon paper_morpheus_dh_ieee.pdf (550.61 KB)
Chuan C.-H., Herremans D..  2018.  Modeling temporal tonal relations in polyphonic music through deep networks with a novel image-based representation. The Thirty-Second AAAI Conference on Artificial Intelligence. PDF icon preprint_lstm.pdf (741.28 KB)
Chopra A., Roy A., Herremans D..  2024.  MIRFLEX: Music Information Retrieval Feature Library for Extraction. ISMIR, Late Breaking Demos. PDF icon 2411.00469v1.pdf (89.86 KB)
Melechovsky J., Roy A., Herremans D..  2024.  MidiCaps — A large-scale MIDI dataset with text captions. ISMIR. PDF icon 2406.02255v1.pdf (699.83 KB)
Guo R, Herremans D, Magnusson T.  2019.  Midi Miner – A Python library for tonal tension and track classification. ISMIR - Late Breaking Demo. PDF icon midi_miner.pdf (83.7 KB)
Song M., Pala T.D, Jin W., Zadeh A., Li C., Herremans D., Poria S..  2026.  Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions. Proceedings of ICLR.
Herremans D., Weisser S., Sörensen K., Conklin D..  2014.  Markov Based Quality Metrics For Generating Structured Music With Optimization Techniques. Digital Music Research Network (DMNR+9). PDF icon dmrn9_dh.pdf (133.29 KB)
Kaliakatsos-Papakostas N., Bastas G., Makris D., Herremans D., Katsouros V., Maragos P..  2022.  A Machine Learning Approach for MIDI to Guitar Tablature Conversion. Sound and Music Computing Conference (SMC). PDF icon 25.pdf (528.42 KB)
Liu R., Roy A., Herremans D..  2025.  Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction.
Luo Y.J., Agres K., Herremans D..  2019.  Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders. ISMIR. PDF icon jyun-ismir.pdf (5.62 MB)
Melechovsky J., Mehrish A., Herremans D., Sisman B..  2023.  Learning accent representation with multi-level VAE towards controllable speech synthesis. IEEE Spoken Language Technology (SLT) Workshop.
Cheuk K.W., BT B, Roig G., Herremans D..  2019.  Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019). PDF icon 1910.01463.pdf (934.76 KB)
Ghosh A., Roy A., Herremans D..  2026.  KARMA-MV: A Benchmark for Causal Question Answering on Music Videos. arXiv:2605.08175. PDF icon 2605.08175v1.pdf (3.32 MB)
Cheuk K.W., Choi K., Kong Q., Li B., Won M., Hung A., Wang J.-C., Herremans D..  2022.  Jointist: Joint Learning for Multi-instrument Transcription and Its Applications. PDF icon 2206.10805.pdf (427.51 KB)
Roy A., Liu R., Lu T., Herremans D..  2025.  JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata. Proceedings of IJCNN, Rome, Italy.
Bhandari K., Chang S., Lu T., Enus F.R, Bradshaw L.B, Herremans D., Colton S..  2025.  ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement. Proceedings of IJCNN.
Cheuk K.W., Agres K., Herremans D..  2020.  The impact of Audio input representations on neural network based music transcription. Proceedings of the International Joint Conference on Neural Networks (IJCNN). PDF icon 2001.09989.pdf (1.87 MB)
Lee-Leon A., Yuen C., Herremans D..  2019.  A Hybrid Fuzzy Logic-Neural Network Approach For Multi-path Separation Of Underwater Acoustic Signals. 89th IEEE Vehicular Technology Conference. PDF icon fuzzy logic.pdf (1.66 MB)
Guo Z, Makris D., Herremans D..  2021.  Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure. Proceedings of the International Joint Conference on Neural Networks (IJCNN). PDF icon 2102.09794.pdf (1015.73 KB)
Tripathi A., A. Singh K, Surya R., Gupta A., Veikho S.L., Herremans D., Bisane S..  2025.  HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies. arXiv:2508.14946.
Turian J, Shier J, Khan HRaj, Raj B, Schuller BW, Steinmetz CJ, Malloy C, Tzanetakis G, Velarde G, McNally K et al..  2022.  HEAR 2021: Holistic Evaluation of Audio Representations. Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track. PDF icon 2203.03022.pdf (406.58 KB)
A. Putri M, Saide S., D. Riau K, Herremans D..  2026.  Generative AI in Education for SDG 4: Insights from Indonesia and Kazakhstan. Proceedings of the Pacific Asia Conference on Information Systems (PACIS)..
Makris D., Agres K., Herremans D..  2021.  Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework. Proceedings of the International Joint Conference on Neural Networks (IJCNN). PDF icon 2104.13056.pdf (857.78 KB)
BT B, Hee H.I., Ming C., Lin Y., Priyadarshinee P., Clarke C.J., Herremans D., Chen J.M..  2022.  A Gaussian mixture classifier model to differentiate respiratory symptoms using phonated /ɑː/ sounds. The 18th Australasian International Conference on Speech Science and Technology (SST). PDF icon ahsounds.pdf (1018.01 KB)
Herremans D., Sörensen K..  2013.  FuX, an Android app that generates counterpoint. IEEE Symposium on Computational Intelligence for Creativity and Affective Computing (CICAC). :48-55.PDF icon wp_fux.pdf (486.27 KB)
Tripathi A., Patle V., Jain A., Pundir A., Menon S., A. Singh K, Herremans D..  2025.  End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation. Proceedings of IJCNN, Rome, Italy.
Bhandari K., Roy A., Colton S., Herremans D..  2026.  Emerging AI Technologies for Music: Towards Controllable, Collaborative, and Creative Systems. Proceedings of Machine Learning Research, PMLR 303:1-5, 2026. PDF icon bhandari26a.pdf (161.47 KB)
Cheuk K.W., Luo Y.J., Benetos E., Herremans D..  2021.  The Effect of Spectrogram Reconstructions on Automatic Music Transcription:An Alternative Approach to Improve Transcription Accuracy. Proceedings of the International Conference on Pattern Recognition (ICPR2020). PDF icon 2010.09969.pdf (3.46 MB)
Agres K, Bigo L, Herremans D, Conklin D.  2015.  The effect of repetitive structure on enjoyment and altered states in uplifting trance music. 2nd International Conference on Music and Consciousness (MUSCON 2), Brighton. PDF icon AgresEtAl_muscon.pdf (12.47 KB)
Lee-Leon A., Yuen C., Herremans D..  2019.  Doppler Invariant Demodulation for Shallow Water Acoustic Communications Using Deep Belief Networks. 16th IEEE Asia Pacific Wireless Communications Symposium (APWCS). PDF icon 1909.02850.pdf (790.54 KB)
Guo Z, Kang J., Herremans D..  2023.  A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling. Proceedings of the 37th AAAI Conference on Artificial Intelligence. PDF icon 2212.00973.pdf (1.74 MB)
Wang K., Herremans D..  2024.  DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage. Proc. of IEEE Tencon, Singapore.
Puri G., Socklingam N., Herremans D..  2026.  Digital Lifelong Learning in the Age of AI: Trends and Insights .
Cheuk K.W., Sawata R, Uesaka T, Murata N, Takahashi N, Takahashi S, Herremans D., Mitsufuji Y.  2023.  DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability. ICASSP. PDF icon diffroll.pdf (2.2 MB)
Nahar F., Agres K., BT B, Herremans D..  2020.  A dataset and classification model for Malay, Hindi, Tamil and Chinese music. 13th Workshop on music and machine learning (MML) as part of ECML/PKDD. PDF icon 2009.04459.pdf (234.8 KB)
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech. Audio Imagination: NeurIPS 2024 Workshop.
Makris D., Guo Z, Kaliakatsos-Papakostas N., Herremans D..  2022.  Conditional Drums Generation using Compound Word Representations. EvoMUSART (EVO*) - Lecture Notes in Computer Science. PDF icon 2202.04464.pdf (525.36 KB)
Clarke C.J., Chowdhury J., BT B, Priyadarshinee P., Lim C.M.Ying, I. Tan FXing, Herremans D., Chen J.M..  2022.  Computationally Efficient Physics Approximating Neural Networks for Highly Nonlinear Maps. 2022 International Conference on Research in Adaptive and Convergent Systems.
Lanzendörfer L.A., Lu T., Perraudin N., Herremans D., Wattenhofer R..  2024.  Coarse-to-Fine Text-to-Music Latent Diffusion. Audio Imagination: NeurIPS 2024 Workshop.
Lanzendörfer L.A., Lu T., Perraudin N., Herremans D., Wattenhofer R..  2025.  Coarse-to-Fine Text-to-Music Latent Diffusion. Proceedings of ICASSP.
T. Phuong HThi, BT B, Herremans D., Roig G..  2021.  AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies. Proceedings of the International Conference on Pattern Recognition (ICPR2020). PDF icon 2010.11188.pdf (7.07 MB)
Husain J.A., Herremans D..  2026.  APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music. arXiv:2605.03395. PDF icon 2605.03395v1 (1).pdf (292.45 KB)
Herremans D., Roy A..  2026.  Aligning Generative Music AI with Human Preferences: Methods and Challenges. Proceedings of AAAI, senior member track. PDF icon 2511.15038v1.pdf (417.24 KB)
BT B, Aslim E.J, Ng YShu Lynn, Kuo TLi Chuen, Chen JShihang, Herremans D., Ng LGuat, Chen J.M..  2020.  Acoustic prediction of flowrate: varying liquid jet stream onto a free surface. IEEE International Conference on Signal Processing and Communications (SPCOM). PDF icon preprint flow.pdf (1.01 MB)
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder. Proc. of IEEE Tencon, Singapore.
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training. Proc. of IEEE Tencon, Singapore.

Pages