You are here

Unsupervised disentanglement of pitch and timbre for isolated musical instrument sounds

Title	Unsupervised disentanglement of pitch and timbre for isolated musical instrument sounds
Publication Type	Conference Paper
Year of Publication	2020
Authors	Luo Y.J., Cheuk K.W., Nakano T., Goto M., Herremans D.
Conference Name	Proceedings of the International Society of Music Information Retrieval (ISMIR)
Date Published	10/2020
Abstract	Disentangling factors of variation aims to uncover latent variables that underlie the process of data generation. In this paper, we propose a framework that achieves unsupervised pitch and timbre disentanglement for isolated musical instrument sounds without relying on data annotations or pre-trained neural networks. Our framework, based on variational auto-encoders, takes as input a spectral frame, and encodes pitch and timbre as categorical and continuous variables, respectively. The input is then reconstructed by combining those variables. Under an unsupervised training setting, a major challenge is that encoders are tasked to capture factors of interest with distinct latent representations, without access to the corresponding ground-truth labels. We therefore introduce auxiliary tasks and objectives which leverage pitch shifting as a strategy to create surrogate labels, thereby encouraging the disentanglement of pitch and timbre. Through an ablation study we analyze the impact of the proposed objectives. The evaluation shows the efficacy of the proposed framework for learning disentangled representations, and verifies its applicability to unsupervised pitch classification and conditional spectral synthesis.
URL	https://arxiv.org/abs/1906.08152