Congratulations to Raven for publishing 'nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolutional Neural Networks', in IEEE Access. nnAudio allows you to calculate spectrograms (linear, log, Mel, CQT) on-the-fly as a layer in PyTorch. This makes the spectrograms finetunable to your task! nnAudio is easy to install with pip, see instructions at https://github.com/KinWaiCheuk/nnAudio
Looking for a tool to extract spectrograms on the fly, integrated as a layer in PyTorch? Look no further than nnAudio, a toolbox developed by PhD student Raven (Cheuk Kin Wai): https://github.com/KinWaiCheuk/nnAudio
nnAudio is available in pip (pip install nnaudio), full documentation available on the github page. Also check out our dedicated paper:
Phd student Thao Phuang's paper on "Multimodal Deep Models for Predicting Affective Responses Evoked by Movies" was awarded best student paper at the 2nd International Workshop on Computer Vision for Physiological Measurement as part of ICCV in Seoul, South Korea. The paper explores how models based on video and audio can predict emotion of movies:
Just published a new article with my PhD student Thao Ha Thi Phuong and Prof. Gemma Roig on 'Multimodal Deep Models for Predicting Affective Responses Evoked by Movies'. The paper will be published in the proceedings of the 2nd International Workshop on Computer Vision for Physiological Measurement as part of ICCV; and will be presented by Thao in Seoul, South Korea. Anybody interested can download the preprint article here (link coming soon!). The source code of our model is available on github.
Prof. Ching-Hua Chuan and I recently edited a Special Issue for Springer's Neural Computing and Applications (IF: 4.213). The idea for the issue came out of the 1st International Workshop on Deep Learning for Music that we organized in Anchorage, US, as part of IJCNN in 2017. We received a nice collection of very interesting articles from scholars all over the world. The issue is set to come out soon (stay tuned).
Together with Eward Lin, Enyan Koh, Dr. Balamurali BT from SUTD, and Dr. Simon Lui from Tencent Music (former SUTD) we published a paper on using an ideal binary mask with CNN for separating singing voice from its musical accompaniment:
Lin K.W.E., BT B, Koh E., Lui S., Herremans D.. In Press. Singing Voice Separation Using a Deep Convolutional Neural Network Trained by Ideal Binary Mask and Cross Entropy. Neural Computing and Applications. DOI: 10.1007/s00521-018-3933-z.
Together with Prof Ching-Hua Chuan from the University of Miami and Prof. Kat Agres from IHPC, A*STAR, I've just published a new article on 'From Context to Concept: Exploring Semantic Relationships in Music with Word2Vec', in Springer's Neural Computing and Applications (impact factor 4.213). The article describes how we can use word2vec to model complex polyphonic pieces of music using the popular embeddings model. The preprint is available here.