On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters

Research Output

We investigate the problem of direct waveform modelling using parametric kernel-based filters in a convolutional neural network (CNN) framework, building on SincNet, a CNN employing the cardinal sine (sinc) function to implement learnable bandpass filters. To this end, the general problem of learning a filterbank consisting of modulated kernel-based baseband filters is studied. Compared to standard CNNs, such models have fewer parameters, learn faster, and require less training data. They are also more amenable to human interpretation, paving the way to embedding some perceptual prior knowledge in the architecture. We have investigated the replacement of the rectangular filters of SincNet with triangular, gammatone and Gaussian filters, resulting in higher model flexibility and a reduction to the phone error rate. We also explore the properties of the learned filters learned for TIMIT phone recognition from both perceptual and statistical standpoints. We find that the filters in the first layer, which directly operate on the waveform, are in accord with the prior knowledge utilised in designing and engineering standard filters such as mel-scale triangular filters. That is, the networks learn to pay more attention to perceptually significant spectral neighbourhoods where the data centroid is located, and the variance and Shannon entropy are highest.

Date:

15 September 2019
Publication Status:

Published
Publisher

ISCA
DOI:

10.21437/interspeech.2019-1257
Funders:

Engineering and Physical Sciences Research Council

http://researchrepository.napier.ac.uk/output/3585886 <p>Loweimi, E., Bell, P., & Renals, S. (2019). On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters. In <i>Proc. Interspeech 2019</i> (3480-3484). https://doi.org/10.21437/interspeech.2019-1257</p>

Citation

Loweimi, E., Bell, P., & Renals, S. (2019). On Learning Interpretable CNNs with Parametric Modulated Kernel-Based Filters. In Proc. Interspeech 2019 (3480-3484). https://doi.org/10.21437/interspeech.2019-1257

Authors

Dr Erfan Loweimi

School of Computing Engineering and the Built Environment

Monthly Views:

Available Documents

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy
Downloadable citations
HTML BIB RTF

Date:

Publication Status:

Publisher

DOI:

Funders:

Citation

Authors

Dr Erfan Loweimi

Monthly Views:

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy

Downloadable citations