VAE for music generation with tension control

Congrats to Rui Guo, who was an intern at AMAAI Lab, SUTD, published a paper on 'A variational autoencoder for music generation controlled by tonal tension', which will be presented next week at 'The 2020 Joint Conference on AI Music Creativity'.

nnAudio, our on-the-fly GPU spectrogram extraction toolbox published in IEEE Access

Congratulations to Raven for publishing 'nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks', in IEEE Access. nnAudio allows you to calculate spectrograms (linear, log, Mel, CQT) on-the-fly as a layer in PyTorch. This makes the spectrograms finetunable to your task! nnAudio is easy to install with pip, see instructions at

PyTorch GPU based audio processing toolkit: nnAudio

Looking for a tool to extract spectrograms on the fly, integrated as a layer in PyTorch? Look no further than nnAudio, a toolbox developed by PhD student Raven (Cheuk Kin Wai):

nnAudio is available in pip (pip install nnaudio), full documentation available on the github page. Also check out our dedicated paper:

Best student paper for multimodal emotion prediction paper

Phd student Thao Phuang's paper on "Multimodal Deep Models for Predicting Affective Responses Evoked by Movies" was awarded best student paper at the 2nd International Workshop on Computer Vision for Physiological Measurement as part of ICCV in Seoul, South Korea. The paper explores how models based on video and audio can predict emotion of movies:

Talk on deep belief networks for doppler invariant demodulation - IEEE APWCS

PhD student Abigail Leon from the AMAAI lab presented a paper at the 16th IEEE Asia Pacific Wireless Communications Symposium (APWCS) on "Doppler Invariant Demodulation for Shallow Water Acoustic Communications Using Deep Belief Networks".

New paper on multimodal emotion prediction models from video and audio

Just published a new article with my PhD student Thao Ha Thi Phuong and Prof. Gemma Roig on 'Multimodal Deep Models for Predicting Affective Responses Evoked by Movies'. The paper will be published in the proceedings of the 2nd International Workshop on Computer Vision for Physiological Measurement as part of ICCV; and will be presented by Thao in Seoul, South Korea. Anybody interested can download the preprint article here (link coming soon!). The source code of our model is available on github.

Editorial for Springer's Deep Learning for Music and Audio special issue

Prof. Ching-Hua Chuan and I recently edited a Special Issue for Springer's Neural Computing and Applications (IF: 4.213). The idea for the issue came out of the 1st International Workshop on Deep Learning for Music that we organized in Anchorage, US, as part of IJCNN in 2017. We received a nice collection of very interesting articles from scholars all over the world. The issue is set to come out soon (stay tuned).

New paper on Singing Voice Estimation in Neural Computing and Applications (Springer)

Together with Eward Lin, Enyan Koh, Dr. Balamurali BT from SUTD, and Dr. Simon Lui from Tencent Music (former SUTD) we published a paper on using an ideal binary mask with CNN for separating singing voice from its musical accompaniment:

Lin K.W.E., BT B, Koh E., Lui S., Herremans D.. In Press. Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy. Neural Computing and Applications. DOI: 10.1007/s00521-018-3933-z.

New publication on modeling music with word2vec in Springer's Neural Computing and Applications

Together with Prof Ching-Hua Chuan from the University of Miami and Prof. Kat Agres from IHPC, A*STAR, I've just published a new article on 'From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec', in Springer's Neural Computing and Applications (impact factor 4.213). The article describes how we can use word2vec to model complex polyphonic pieces of music using the popular embeddings model. The preprint is available here.

New Frontiers in Psychology paper on A Novel Graphical Interface for the Analysis of Music Practice Behaviors

The paper I wrote together with Janis Sokolovskis and Elaine Chew from QMUL, called A Novel Interface for the Graphical Analysis of Music Practice Behaviours was just poublished in Frontiers in Psychology - Human-Media Interaction. Read the full article here or download the pdf.