Publications
.
2026. Aligning Generative Music AI with Human Preferences: Methods and Challenges. Proceedings of AAAI, senior member track.
2511.15038v1.pdf (417.24 KB)
.
2026. APEX: Large-scale Multi-task Aesthetic-Informed Popularity Prediction for AI-Generated Music. arXiv:2605.03395.
2605.03395v1 (1).pdf (292.45 KB)
.
2026. Development of Interpretable Deep Learning-based Segmentation Algorithm for Automated Assessment of Oral Diadochokinesis in Progressive Neurological Diseases. Journal of Speech, Language, and Hearing Research.
.
2026. Emerging AI Technologies for Music: Towards Controllable, Collaborative, and Creative Systems. Proceedings of Machine Learning Research, PMLR 303:1-5, 2026.
bhandari26a.pdf (161.47 KB)
.
2026. Generative AI in Education for SDG 4: Insights from Indonesia and Kazakhstan. Proceedings of the Pacific Asia Conference on Information Systems (PACIS)..
.
2026. KARMA-MV: A Benchmark for Causal Question Answering on Music Videos. arXiv:2605.08175.
2605.08175v1.pdf (3.32 MB)
.
2026. Measuring and Mitigating Rapport Bias of Large Language Models under Multi-Agent Social Interactions. Proceedings of ICLR.
.
2026. Scaffolded Vulnerability: Chatbot-Mediated Reciprocal Self-Disclosure and Need-Supportive Interaction in Couples. Proceedings of CHI.
.
2026. SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering. Proceedings of ICML.
2508.03448v2.pdf (3.31 MB)
.
2026. Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment. ICASSP.
2505.12669v1.pdf (360.69 KB)
.
2026. Text2Score: Generating Sheet Music From Textual Prompts. arXiv:2605.13431.
2605.13431v1.pdf (395.67 KB)
.
2025. Analysis and Synthesis of Audio with AI: from Neurological Disease to Accented Speech and Music.
thesis_Jan.pdf (26.4 MB)
.
2025. Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges IEEE Transactions on Affective Computing.
2406.08809v1.pdf (156.19 KB)
.
2025. BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features. Expert Systems with Applications. 130059
2407.10462v2.pdf (2.6 MB)
.
2025. Coarse-to-Fine Text-to-Music Latent Diffusion. Proceedings of ICASSP.
.
2025. Demystifying deep search: a holistic evaluation with hint-free multi-hop questions and factorised metrics. arXiv:2510.05137.
.
2025. End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation. Proceedings of IJCNN, Rome, Italy.
.
2025. An exploration of controllability in symbolic music infilling. IEEE Access.
.
2025. Forecasting Bitcoin Volatility Spikes from Whale Transactions and Cryptoquant Data Using Synthesizer Transformer Models. IEEE Access. 13:117788-117807.
SSRN-id4247684.pdf (5.05 MB)
.
2025. HHNAS-AM: Hierarchical Hybrid Neural Architecture Search using Adaptive Mutation Policies. arXiv:2508.14946.
.
2025. ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement. Proceedings of IJCNN.
.
2025. JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata. Proceedings of IJCNN, Rome, Italy.
.
2025. Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction.
.
2025. MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection. arXiv:2505.20979.
.
2025. Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey. ACM Computing Surveys.
2402.17467.pdf (1.01 MB)
.
2025. PRESENT: Zero-Shot Text-to-Prosody Control. IEEE Signal Processing Letters.
2408.06827v1.pdf (367.55 KB)
.
2025. Royalties in the age of AI: paying artists for AI-generated songs. WIPO Magazine.
.
2025. Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction.
2512.05402v1.pdf (908 KB)
.
2025. SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning. Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10th - 12th, 2025.
.
2025. Text2midi: Generating Symbolic Music from Captions. Proceedings of AAAI, Philadelphia.
2412.16526v2.pdf (569.51 KB)
.
2025. Towards the future of education: cyber-physical learning. Discover Education. 4:1–16.
.
2024. Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training. Proc. of IEEE Tencon, Singapore.
.
2024. Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder. Proc. of IEEE Tencon, Singapore.
.
2024. Coarse-to-Fine Text-to-Music Latent Diffusion. Audio Imagination: NeurIPS 2024 Workshop.
.
2024. DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech. Audio Imagination: NeurIPS 2024 Workshop.
.
2024. DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts. arXiv:2406.08742.
2406.08742v1.pdf (1.06 MB)
.
2024. DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage. Proc. of IEEE Tencon, Singapore.
.
2024. Gamification and skills tree. Trends and Foresight Report on Cyber-Physical Learning.
.
2024. MidiCaps — A large-scale MIDI dataset with text captions. ISMIR.
2406.02255v1.pdf (699.83 KB)
.
2024. MIRFLEX: Music Information Retrieval Feature Library for Extraction. ISMIR, Late Breaking Demos.
2411.00469v1.pdf (89.86 KB)
.
2024. Modern Portfolio Construction with Advanced Deep Learning Models. SUTD. PhD
Joel_Ong_Thesis.pdf (3.44 MB)
.
2024. Mustango: Toward Controllable Text-to-Music Generation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pages 8293–8316.
2311.08355 (1).pdf (11.38 MB)
.
2024. SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech. Proc. of IEEE Tencon, Singapore.
2211.07283.pdf (435.22 KB)
.
2024. Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. Expert Systems with Applications.
2311.00968.pdf (5.51 MB)
.
2023. Constructing Time-Series Momentum Portfolios with Deep Multi-Task Learning. Expert Systems with Applications. 230(120587)
2306.13661.pdf (707.95 KB)
.
2023. DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability. ICASSP.
diffroll.pdf (2.2 MB)
.
2023. A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling. Proceedings of the 37th AAAI Conference on Artificial Intelligence.
2212.00973.pdf (1.74 MB)
.
2023. Learning accent representation with multi-level VAE towards controllable speech synthesis. IEEE Spoken Language Technology (SLT) Workshop.
.
2023. MERP: A Music Dataset with Emotion Ratings and Raters’ Profile Information. Sensors - Intelligent Sensors. 23(1)
sensors-23-00382 (2).pdf (1.21 MB)
.
2023. A Multimodal Model with Twitter Finbert Embeddings for Extreme Price Movement Prediction of Bitcoin. Expert Systems with Applications.
2206.00648.pdf (3.26 MB)
.
2022. Computationally Efficient Physics Approximating Neural Networks for Highly Nonlinear Maps. 2022 International Conference on Research in Adaptive and Convergent Systems.
.
2022. Conditional Drums Generation using Compound Word Representations. EvoMUSART (EVO*) - Lecture Notes in Computer Science.
2202.04464.pdf (525.36 KB)
.
2022. Downscaling using Deep Convolutional Autoencoders, a case study for South East Asia. Egusphere preprint.
egusphere-2022-234.pdf (8.99 MB)
.
2022. EmoMV: Affective Music-Video Correspondence Learning Datasets for Classification and Retrieval. Information Fusion.
SSRN-id4189323.pdf (2.01 MB)
.
2022. A Gaussian mixture classifier model to differentiate respiratory symptoms using phonated /ɑː/ sounds. The 18th Australasian International Conference on Speech Science and Technology (SST).
ahsounds.pdf (1018.01 KB)
.
2022. HEAR 2021: Holistic Evaluation of Audio Representations. Proceedings of Machine Learning Research (PMLR): NeurIPS 2021 Competition Track.
2203.03022.pdf (406.58 KB)
.
2022. Jointist: Joint Learning for Multi-instrument Transcription and Its Applications.
2206.10805.pdf (427.51 KB)
.
2022. A Machine Learning Approach for MIDI to Guitar Tablature Conversion. Sound and Music Computing Conference (SMC).
25.pdf (528.42 KB)
.
2022. MusIAC: An extensible generative framework for Music Infilling Application with multi-level Control. EvoMUSART.
2202.05528.pdf (893.23 KB)
.
2022. Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses. Arxiv preprint.
.
2022. Single Image Video Prediction with Auto-Regressive GANs. Sensors. 22:3533.
.
2022. Understanding Audio Features via Trainable Basis Functions. Arxiv preprint.
2204.11437.pdf (7.36 MB)
.
2022. A white paper on cyberphysical learning. White paper, Singapore University of Technology and Design.
LSL_WhitePaper_Cyber-physical-Campus-Higher-Education.pdf (6.98 MB)
.
2021. aiSTROM - A roadmap for developing a successful AI strategy. IEEE Access.
.
2021. AttendAffectNet – Emotion Prediction of Movie Viewers Using Multimodal Fusion with Self-attention. Sensors. Special issue on Intelligent Sensors: Sensor Based Multi-Modal Emotion Recognition.
sensors-21-08356.pdf (1.03 MB)
.
2021. AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies. Proceedings of the International Conference on Pattern Recognition (ICPR2020).
2010.11188.pdf (7.07 MB)
.
2021. Deep Neural Network Based Respiratory Pathology Classification Using Cough Sounds. Sensors. 21(16):5555.
2106.12174.pdf (6.52 MB)
.
2021. The Effect of Spectrogram Reconstructions on Automatic Music Transcription:An Alternative Approach to Improve Transcription Accuracy. Proceedings of the International Conference on Pattern Recognition (ICPR2020).
2010.09969.pdf (3.46 MB)
.
2021. Evaluating the Effectiveness of an Augmented Reality Game Promoting Environmental Action. Sustainability. 13(24):13912.
sustainability-13-13912.pdf (16.23 MB)
.
2021. Generating Lead Sheets with Affect: A Novel Conditional seq2seq Framework. Proceedings of the International Joint Conference on Neural Networks (IJCNN).
2104.13056.pdf (857.78 KB)
.
2021. Hierarchical Recurrent Neural Networks for Conditional Melody Generation with Long-term Structure. Proceedings of the International Joint Conference on Neural Networks (IJCNN).
2102.09794.pdf (1015.73 KB)
.
2021. Music, Computing, and Health: A roadmap for the current and future roles of music technology for healthcare and well-being. Music & Science.
Preprint for OSF_Agres, Schaefer, Volk, et al. (2021)_Music & Science_watermark.pdf (4.07 MB)
.
2021. Musical stylometry: Characterisation of music. Multivariate Humanities.
.
2021. ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data. ACM Multimedia.
.
2021. Revisiting the Onsets and Frames Model with Additive Attention. Proceedings of the International Joint Conference on Neural Networks (IJCNN).
2104.06607.pdf (1.52 MB)
.
2021. Underwater Acoustic Communication Receiver Using Deep Belief Network. IEEE Transactions on Communications. :1-1.
2102.13397.pdf (12.87 MB)
.
2020. Acoustic prediction of flowrate: varying liquid jet stream onto a free surface. IEEE International Conference on Signal Processing and Communications (SPCOM).
preprint flow.pdf (1.01 MB)
.
2020. Asthmatic versus healthy child classification based on cough and vocalised /a:/ sounds. The Journal of the Acoustical Society of America (JASA). 148, EL253
.
2020. Data-driven 3D Scene Understanding. PhD
.
2020. A dataset and classification model for Malay, Hindi, Tamil and Chinese music. 13th Workshop on music and machine learning (MML) as part of ECML/PKDD.
2009.04459.pdf (234.8 KB)
.
2020. Generative Modelling for Controllable Audio Synthesis of Expressive Piano Performance. Workshop on Machine Learning for Music Discover (ML4MD) as part of ICML.
2006.09833.pdf (2.81 MB)
.
2020. The impact of Audio input representations on neural network based music transcription. Proceedings of the International Joint Conference on Neural Networks (IJCNN).
2001.09989.pdf (1.87 MB)
.
2020. Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling. ISMIR.
2007.15474.pdf (2.67 MB)
.
2020. nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks. IEEE Access.
nnAudio.pdf (10.2 MB)
.
2020. PerceptionGAN: Real-world image construction from provided text through perceptual understanding. 4th Int. Conf. on Imaging, Vision and Pattern Recognition (IVPR), and 9th Int. Conf. on Informatics, Electronics & Vision (ICIEV).
perceptionGAN-preprint.pdf (2.83 MB)
.
2020. Regression-based music emotion prediction using triplet neural networks. Proceedings of the International Joint Conference on Neural Networks (IJCNN).
2001.09988.pdf (777.31 KB)
.
2020. Singing voice conversion with disentangled representations of singer and vocal technique using variational autoencoders. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
1912.02613.pdf (2.9 MB)
.
2020. Unsupervised disentanglement of pitch and timbre for isolated musical instrument sounds. Proceedings of the International Society of Music Information Retrieval (ISMIR).
.
2020. A variational autoencoder for music generation controlled by tonal tension. Joint Conference on AI Music Creativity (CSMC + MuMe).
2010.06230.pdf (622.82 KB)
.
2019. Development of Machine Learning for asthmatic and healthy voluntary cough - a proof of concept study. Applied Sciences. 9(14)
applsci-09-02833.pdf (2.06 MB)
.
2019. Doppler Invariant Demodulation for Shallow Water Acoustic Communications Using Deep Belief Networks. 16th IEEE Asia Pacific Wireless Communications Symposium (APWCS).
1909.02850.pdf (790.54 KB)
.
2019. The emergence of deep learning: new opportunities for music and audio technologies. Neural Computing and Applications.
main_preprint.pdf (102.16 KB)
.
2019. A Hybrid Fuzzy Logic-Neural Network Approach For Multi-path Separation Of Underwater Acoustic Signals. 89th IEEE Vehicular Technology Conference.
fuzzy logic.pdf (1.66 MB)
.
2019. The impact of musical structure on enjoyment and absorptive listening states in trance music. Music and Consciousness 2 - Worlds, Practices, Modalities.
.
2019. Latent space representation for multi-target speaker detection and identification with a sparse dataset using Triplet neural networks. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019).
1910.01463.pdf (934.76 KB)
.
2019. Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders. ISMIR.
jyun-ismir.pdf (5.62 MB)
.
2019. Machine Learning Research that Matters for Music Creation: A Case Study. Journal of New Music Research. 48(1):36-55.
concert_paper_preprint.pdf (1.6 MB)
]