The Audio, Music, and AI Lab at SUTD is organizing bi-weekly webinars on Music Information Retrieval (MIR). The aim is to connect different labs working on similar topics and enable international collaboration. Participating universities include SUTD, QMUL,... The Webinars will be organized on Wednesdays at 4pm Singapore time (9am UK time - 10am EU time).
Are you interested in presenting some of your work for 20-30min at an upcoming Seminar? Send me an email with subject [AMAAI MIR Webinar].
Do you want to join and listen in to our webinars? Just sign up to our google group and you will receive the video invites: https://groups.google.com/#!forum/amaai-mir-webinars/join
Your talk? Email me!
Title: VaPar Synth - A Variational Parametric Model for Audio Synthesis
Abstract: With the advent of data-driven statistical modeling and abundant computing power, researchers are turning increasingly to deep learning for audio synthesis. These methods try to model audio signals directly in the time or frequency domain. In the interest of more flexible control over the generated sound, it could be more useful to work with a parametric representation of the signal which corresponds more directly to the musical attributes such as pitch, dynamics, and timbre. We present VaPar Synth - a Variational Parametric Synthesizer which utilizes a conditional variational autoencoder trained on a suitable parametric representation. We demonstrate our proposed model’s capabilities via the reconstruction and generation of instrumental tones with flexible control over their pitch
Brief Bio of the speaker:
I'm Krishna, a Dual Degree Student (Bachelors + Masters) at IIT Bombay majoring in EE and specializing in Signal Processing. I have had the opportunity to spend time as a visiting researcher at the Music Technology Group and Kyoto University, working on MIR and statistical signal processing. I'll be graduating in August, and will begin my Ph.D. in EE at the University of Illinois at Urbana-Champaign.
You can find more details about my research on my homepage: https://www.ee.iitb.ac.in/student/~krishnasubramani/
Abstract: In this talk, we focus on discussing the usage of latent variable models (specifically, we will discussed variational autoencoders (VAEs)) in style-based music modelling. The term "style" can refer to musical factors such as genre, mood, composer style, etc., which are often desired to be controlled especially in generation tasks. The talk will mainly cover:
(i) the strengths of using latent variable frameworks for music modelling;
(ii) common techniques of utilizing the latent space for downstream tasks, e.g. symbolic generation, audio synthesis, style transfer, etc.;
(iii) challenges of learning useful latent variable models and possible solutions to tackle them.
Bio: Hao Hao Tan is a research assistant at Singapore University of Technology of Design, supervised by Professor Gemma Roig, Professor Dorien Herremans, and Dr. Kat Agres. He received a Bachelor of Engineering in Computer Science from Nanyang Technological University, Singapore. Hao Hao is currently working on music generation based on video content and perceived emotion.
Hao Hao’s recent paper: https://arxiv.org/abs/2006.09833
Title: Disentangled Representation Learning Using Gaussian-Mixture Variational Auto-encoders: Applications for Synthesis and Conversion of Musical Signals
Abstract: Disentangled representation learning aims to uncover generative factors of variation of data. This could enable analysis of interpretable features and synthesis of novel data. In the context of deep learning, variational auto-encoders (VAEs) are one of the most popular frameworks for learning disentangled representations. VAEs describe a data-generating process that first samples a latent variable from a prior distribution, and samples an observation from a distribution conditioned on the latent variable. Training VAEs thus captures disentangled representations with the latent variable. In this talk, we present a VAE that learns significant factors of variation for either isolated musical instrument sounds or expressive singing voices [1, 2]. In particular, we exploit Gaussian-mixture prior distribution for the latent variables of interest, thereby capturing multi-modality of the data. We verify and demonstrate the model's capability of controllable attribute synthesis and conversion.
 Yin-Jyun Luo, Kat Agres, Dorien Herremans. "Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders," ISMIR 2019. preprint
 Yin-Jyun Luo, Chin-Cheng Hsu, Kat Agres, Dorien Herremans, "Singing Voice Conversion with Disentangled Representations of Singer and Vocal Technique Using Variational Autoencoders," ICASSP 2020. preprint
Webinar on nnAudio by Kin Wai (Raven) Cheuk, who has developed an (already popular!) GPU-based tool for on-the-fly spectrogram extraction. Raven will give a talk via Zoom (subscribe to google group above or contact me to get Zoom link) on Wednesday 3 June at 4pm SGT.
Title: nnAudio: a GPU audio processing tool
Abstract: Raven will present a recently released neural-network based audio processing toolbox called nnAudio. This toolbox leverages 1D convolutional neural networks for real-time spectrogram generation (time-domain to frequency-domain conversion). This enables us to generate spectrograms on-the-fly without the need to store any of the spectrograms on the disk when training neural networks for audio related tasks.