Publications
.
2022. Predicting emotion from music videos: exploring the relative contribution of visual and auditory information to affective responses. Arxiv preprint.
.
2022. Single Image Video Prediction with Auto-Regressive GANs. Sensors. 22:3533.
.
2022. Understanding Audio Features via Trainable Basis Functions. Arxiv preprint.
2204.11437.pdf (7.36 MB)
.
2022. A white paper on cyberphysical learning. White paper, Singapore University of Technology and Design.
LSL_WhitePaper_Cyber-physical-Campus-Higher-Education.pdf (6.98 MB)
.
2023. Constructing Time-Series Momentum Portfolios with Deep Multi-Task Learning. Expert Systems with Applications. 230(120587)
2306.13661.pdf (707.95 KB)
.
2023. DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability. ICASSP.
diffroll.pdf (2.2 MB)
.
2023. A Domain-Knowledge-Inspired Music Embedding Space and a Novel Attention Mechanism for Symbolic Music Modeling. Proceedings of the 37th AAAI Conference on Artificial Intelligence.
2212.00973.pdf (1.74 MB)
.
2023. Learning accent representation with multi-level VAE towards controllable speech synthesis. IEEE Spoken Language Technology (SLT) Workshop.
.
2023. MERP: A Music Dataset with Emotion Ratings and Raters’ Profile Information. Sensors - Intelligent Sensors. 23(1)
sensors-23-00382 (2).pdf (1.21 MB)
.
2023. A Multimodal Model with Twitter Finbert Embeddings for Extreme Price Movement Prediction of Bitcoin. Expert Systems with Applications.
2206.00648.pdf (3.26 MB)
.
2024. Accent Conversion in Text-To-Speech Using Multi-Level VAE and Adversarial Training. Proc. of IEEE Tencon, Singapore.
.
2024. Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder. Proc. of IEEE Tencon, Singapore.
.
2024. Coarse-to-Fine Text-to-Music Latent Diffusion. Audio Imagination: NeurIPS 2024 Workshop.
.
2024. DART: Disentanglement of Accent and Speaker Representation in Multispeaker Text-to-Speech. Audio Imagination: NeurIPS 2024 Workshop.
.
2024. DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts. arXiv:2406.08742.
2406.08742v1.pdf (1.06 MB)
.
2024. DisfluencySpeech -- Single-Speaker Conversational Speech Dataset with Paralanguage. Proc. of IEEE Tencon, Singapore.
.
2024. Gamification and skills tree. Trends and Foresight Report on Cyber-Physical Learning.
.
2024. MidiCaps — A large-scale MIDI dataset with text captions. ISMIR.
2406.02255v1.pdf (699.83 KB)
.
2024. MIRFLEX: Music Information Retrieval Feature Library for Extraction. ISMIR, Late Breaking Demos.
2411.00469v1.pdf (89.86 KB)
.
2024. Modern Portfolio Construction with Advanced Deep Learning Models. SUTD. PhD
Joel_Ong_Thesis.pdf (3.44 MB)
.
2024. Mustango: Toward Controllable Text-to-Music Generation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). pages 8293–8316.
2311.08355 (1).pdf (11.38 MB)
.
2024. SNIPER Training: Variable Sparsity Rate Training For Text-To-Speech. Proc. of IEEE Tencon, Singapore.
2211.07283.pdf (435.22 KB)
.
2024. Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. Expert Systems with Applications.
2311.00968.pdf (5.51 MB)
.
2025. Analysis and Synthesis of Audio with AI: from Neurological Disease to Accented Speech and Music.
thesis_Jan.pdf (26.4 MB)
.
2025. Are we there yet? A brief survey of Music Emotion Prediction Datasets, Models and Outstanding Challenges IEEE Transactions on Affective Computing.
2406.08809v1.pdf (156.19 KB)
.
2025. BandCondiNet: Parallel Transformers-based Conditional Popular Music Generation with Multi-View Features. Expert Systems with Applications. 130059
2407.10462v2.pdf (2.6 MB)
.
2025. Coarse-to-Fine Text-to-Music Latent Diffusion. Proceedings of ICASSP.
.
2025. End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation. Proceedings of IJCNN, Rome, Italy.
.
2025. An exploration of controllability in symbolic music infilling. IEEE Access.
.
2025. Forecasting Bitcoin Volatility Spikes from Whale Transactions and Cryptoquant Data Using Synthesizer Transformer Models. IEEE Access. 13:117788-117807.
SSRN-id4247684.pdf (5.05 MB)
.
2025. ImprovNet: Generating Controllable Musical Improvisations with Iterative Corruption Refinement. Proceedings of IJCNN, Rome, Italy.
.
2025. JamendoMaxCaps: A Large Scale Music-caption Dataset with Imputed Metadata. Proceedings of IJCNN, Rome, Italy.
.
2025. Leveraging LLM Embeddings for Cross Dataset Label Alignment and Zero Shot Music Emotion Prediction.
.
2025. LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions. arXiv:2508.18321.
.
2025. MelodySim: Measuring Melody-aware Music Similarity for Plagiarism Detection. arXiv:2505.20979.
.
2025. Natural Language Processing Methods for Symbolic Music Generation and Information Retrieval: a Survey. ACM Computing Surveys.
2402.17467.pdf (1.01 MB)
.
2025. PRESENT: Zero-Shot Text-to-Prosody Control. IEEE Signal Processing Letters.
2408.06827v1.pdf (367.55 KB)
.
2025. Royalties in the age of AI: paying artists for AI-generated songs. WIPO Magazine.
.
2025. SonicMaster: Towards Controllable All-in-One Music Restoration and Mastering. arXiv:2508.03448.
2508.03448v2.pdf (3.31 MB)
.
2025. SonicVerse: Multi-Task Learning for Music Feature-Informed Captioning. Proceedings of the 6th Conference on AI Music Creativity (AIMC 2025), Brussels, Belgium, September 10th - 12th, 2025.
.
2025. Text2midi: Generating Symbolic Music from Captions. Proceedings of AAAI, Philadelphia.
2412.16526v2.pdf (569.51 KB)
.
2025. Text2midi-InferAlign: Improving Symbolic Music Generation with Inference-Time Alignment. arXiv:2505.12669.
.
2025. Towards the future of education: cyber-physical learning. Discover Education. 4:1–16.
.
2026. Aligning Generative Music AI with Human Preferences: Methods and Challenges. Proceedings of AAAI, senior member track.
]