ReconVAT presented in ACM Multimedia

Congrats to Kin Wa Cheuk for his published paper in the ACM Multimedia conference (A*) on 'ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data'. If you are interested in training low-data music transcription models with semi-supervised learning, check out the full paper here, or access the preprint.

Watch Raven's talk here:

Abstract: Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres which are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT shows competitive performance on the common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translate to improvements of 22.2% and 62.5% over the supervised baseline model. Our proposed framework also demonstrates the potential of continuous learning on new data, which could be used in real-world applications such as online training and transcribing instrumental covers of pop music.