Understanding Audio Features via Trainable Basis Functions
|Title||Understanding Audio Features via Trainable Basis Functions|
|Publication Type||Conference Paper|
|Year of Publication||2022|
|Authors||Kwan Y.H., Cheuk K.W., Herremans D.|
|Conference Name||Arxiv preprint|
In this paper we explore the possibility of maximizing the information represented in spectrograms by making the spectrogram basis functions trainable. We experiment with two different tasks, namely keyword spotting (KWS) and automatic speech recognition (ASR). For most neural network models, the architecture and hyperparameters are typically fine-tuned and optimized in experiments. Input features, however, are often treated as fixed. In the case of audio, signals can be mainly expressed in two main ways: raw waveforms (time-domain) or spectrograms (time-frequency-domain). In addition, different spectrogram types are often used and tailored to fit different applications. In our experiments, we allow for this tailoring directly as part of the network.