The impact of Audio input representations on neural network based music transcription

TitleThe impact of Audio input representations on neural network based music transcription
Publication TypeConference Paper
Year of Publication2020
AuthorsCheuk K.W., Agres K., Herremans D.
Conference NameProceedings of the International Joint Conference on Neural Networks (IJCNN)
Date Published07/2020
Conference LocationGlasgow
Other NumbersarXiv:2001.09989
Abstract

This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription. We use our own GPU based spectrogram extraction tool, nnAudio, to investigate the influence of using a linear-frequency spectrogram, log-frequency spectrogram, Mel spectrogram, and constant-Q transform (CQT). Our results show that a 8.33% increase in transcription accuracy and a 9.39% reduction in error can be obtained by choosing the appropriate input representation (log-frequency spectrogram with STFT window length 4,096 and 2,048 frequency bins in the spectrogram) without changing the neural network design (single layer fully connected). Our experiments also show that Mel spectrogram is a compact representation for which we can reduce the number of frequency bins to only 512 while still keeping a relatively high music transcription accuracy.

URLhttps://arxiv.org/abs/2001.09989