nnAudio - a PyTorch tool for Audio Processing using GPU

A new library was created that can calculate different types of spectrograms on the fly by leveraging PyTorch and GPU processing. nnAudio currently supports the calculation of linear-frequency spectrogram, log-frequency spectrogram, Mel-spectrogram, and Constant Q Transform (CQT).

The library is introduced in the below publication. If you make use of it, please add a citation to this document:

K.W. Cheuk, K. Agres, D. Herremans. 2019. nnAudio: A PyTorch Audio Processing Tool Using 1D Convolution neural networks. ISMIR - Late Breaking Demo. Delft. The Netherlands.

nnAudio is available at https://github.com/KinWaiCheuk/nnAudio

The speed increase obtained by nnAudio is depicted in the figure below, which indicates over 100x performance increase compared to traditional methods such as the one implemented in librosa. The graph shows the computation time in seconds required to process 1,770 audio excerpts for different implementation techniques using a DGX with Intel(R) Xeon(R) CPU E5-2698, and 1 Tesla V100 DGXS 32GB GPU.