Publications

Export 146 results:
Author Title Type [ Year(Desc)]
2024
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training. Proc. of IEEE Tencon, Singapore.
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder. Proc. of IEEE Tencon, Singapore.
Lanzendörfer L.A., Lu T., Perraudin N., Herremans D., Wattenhofer R..  2024.  Coarse-to-Fine Text-to-Music Latent Diffusion. Audio Imagination: NeurIPS 2024 Workshop.
Melechovsky J., Mehrish A., Sisman B., Herremans D..  2024.  DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech. Audio Imagination: NeurIPS 2024 Workshop.
Ong J., Herremans D..  2024.  DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts. arXiv:2406.08742. PDF icon 2406.08742v1.pdf (1.06 MB)
Wang K., Herremans D..  2024.  DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage. Proc. of IEEE Tencon, Singapore.
Chow D., Herremans D..  2024.  Gamification and skills tree. Trends and Foresight Report on Cyber-Physical Learning.
Melechovsky J., Roy A., Herremans D..  2024.  MidiCaps — A large-scale MIDI dataset with text captions. ISMIR. PDF icon 2406.02255v1.pdf (699.83 KB)
Chopra A., Roy A., Herremans D..  2024.  MIRFLEX: Music Information Retrieval Feature Library for Extraction. ISMIR, Late Breaking Demos. PDF icon 2411.00469v1.pdf (89.86 KB)
Ong J..  2024.  Modern Portfolio Construction with Advanced Deep Learning Models. SUTD. PhDPDF icon Joel_Ong_Thesis.pdf (3.44 MB)
Melechovsky J, Guo Z, Ghosal D, Majumder N, Herremans D, Poria S.  2024.  Mustango: Toward Controllable Text-to-Music Generation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pages 8293–8316. PDF icon 2311.08355 (1).pdf (11.38 MB)
Lam P., Zhang H., Chen N.F, Sisman B., Herremans D..  2024.  SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech. Proc. of IEEE Tencon, Singapore. PDF icon 2211.07283.pdf (435.22 KB)
Kang J, Poria S, Herremans D..  2024.  Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. Expert Systems with Applications. PDF icon 2311.00968.pdf (5.51 MB)
2025
Melechovsky J..  2025.  Analysis and Synthesis of Audio with AI: from Neurological Disease to Accented Speech and Music. PDF icon thesis_Jan.pdf (26.4 MB)
Kang J., Herremans D..  2025.  Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges IEEE Transactions on Affective Computing. PDF icon 2406.08809v1.pdf (156.19 KB)
Luo J., Yang X., Herremans D..  2025.  BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features. Expert Systems with Applications. 130059PDF icon 2407.10462v2.pdf (2.6 MB)
Lanzendörfer L.A., Lu T., Perraudin N., Herremans D., Wattenhofer R..  2025.  Coarse-to-Fine Text-to-Music Latent Diffusion. Proceedings of ICASSP.
Tripathi A., Patle V., Jain A., Pundir A., Menon S., A. Singh K, Herremans D..  2025.  End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation. Proceedings of IJCNN, Rome, Italy.
Guo R, Herremans D..  2025.  An exploration of controllability in symbolic music infilling. IEEE Access.
Herremans D., Low K.W..  2025.  Forecasting Bitcoin Volatility Spikes from Whale Transactions and Cryptoquant Data Using Synthesizer Transformer Models. IEEE Access. 13:117788-117807.PDF icon SSRN-id4247684.pdf (5.05 MB)
Bhandari K., Chang S., Lu T., Enus F.R, Bradshaw L.B, Herremans D., Colton S..  2025.  ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement. Proceedings of IJCNN, Rome, Italy.
Roy A., Liu R., Lu T., Herremans D..  2025.  JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata. Proceedings of IJCNN, Rome, Italy.
Liu R., Roy A., Herremans D..  2025.  Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction.
Song M., Pala T.D, Jin W., Zadeh A., Li C., Herremans D., Poria S..  2025.  LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions. arXiv:2508.18321.
Lu T., Geist C-M, Melechovsky J., Roy A., Herremans D..  2025.  MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection. arXiv:2505.20979.
Le D-V-T, Bigo L., Keller M., Herremans D..  2025.  Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey. ACM Computing Surveys. PDF icon 2402.17467.pdf (1.01 MB)
Lam P., Zhang H., Chen N.F, Sisman B., Herremans D..  2025.  PRESENT: Zero-Shot Text-to-Prosody Control. IEEE Signal Processing Letters. PDF icon 2408.06827v1.pdf (367.55 KB)
Wei M., Modrzejewski M., Sivaraman A., Herremans D..  2025.  Prevailing Research Areas for Music AI in the Era of Foundation Models.
Herremans D..  2025.   Royalties in the age of AI: paying artists for AI-generated songs. WIPO Magazine.
Melechovsky J., Mehrish A., Roy A., Herremans D..  2025.  SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering. arXiv:2508.03448. PDF icon 2508.03448v2.pdf (3.31 MB)
Chopra A., Roy A., Herremans D..  2025.  SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning. Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10th - 12th, 2025.
Bhandari K., Roy A., Wang K., Puri G., Colton S., Herremans D..  2025.  Text2midi: Generating Symbolic Music from Captions. Proceedings of AAAI, Philadelphia. PDF icon 2412.16526v2.pdf (569.51 KB)
Roy A., Puri G., Herremans D..  2025.  Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment. arXiv:2505.12669.
Sockalingam N., Lo K., Teo J., Wei C.C., Chow D., Herremans D., Jun M.L.M., Kurniawan O., Wang Y., Leong P.K.  2025.  Towards the future of education: cyber-physical learning. Discover Education. 4:1–16.
Kang J., Herremans D..  2025.  Towards Unified Music Emotion Recognition across Dimensional and Categorical Models.

Pages