The impact of Audio input representations on neural network based music transcription

Title	The impact of Audio input representations on neural network based music transcription
Publication Type	Conference Paper
Year of Publication	2020
Authors	Cheuk K.W., Agres K., Herremans D.
Conference Name	Proceedings of the International Joint Conference on Neural Networks (IJCNN)
Date Published	07/2020
Conference Location	Glasgow
Other Numbers	arXiv:2001.09989
Abstract	This paper thoroughly analyses the effect of different input representations on polyphonic multi-instrument music transcription. We use our own GPU based spectrogram extraction tool, nnAudio, to investigate the influence of using a linear-frequency spectrogram, log-frequency spectrogram, Mel spectrogram, and constant-Q transform (CQT). Our results show that a 8.33% increase in transcription accuracy and a 9.39% reduction in error can be obtained by choosing the appropriate input representation (log-frequency spectrogram with STFT window length 4,096 and 2,048 frequency bins in the spectrogram) without changing the neural network design (single layer fully connected). Our experiments also show that Mel spectrogram is a compact representation for which we can reduce the number of frequency bins to only 512 while still keeping a relatively high music transcription accuracy.
URL	https://arxiv.org/abs/2001.09989

File:

Dorien Herremans