Trainable Dynamic Subsampling for End-to-End Speech Recognition

Research Output

Jointly optimised attention-based encoder-decoder models have yielded impressive speech recognition results. The recurrent neural network (RNN) encoder is a key component in such models — it learns the hidden representations of the inputs. However, it is difficult for RNNs to model the long sequences characteristic of speech recognition. To address this, subsampling between stacked recurrent layers of the encoder is commonly employed. This method reduces the length of the input sequence and leads to gains in accuracy. However, static subsampling may both include redundant information and miss relevant information.

We propose using a dynamic subsampling RNN (dsRNN) encoder. Unlike a statically subsampled RNN encoder, the dsRNN encoder can learn to skip redundant frames. Furthermore, the skip ratio may vary at different stages of training, thus allowing the encoder to learn the most relevant information for each epoch. Although the dsRNN is unidirectional, it yields lower phone error rates (PERs) than a bidirectional RNN on TIMIT. The dsRNN encoder has a 16.8% PER on the TIMIT test set, a considerable improvement over static subsampling methods used with unidirectional and bidirectional RNN encoders (23.5% and 20.4% PER respectively).

Date:

15 September 2019
Publication Status:

Published
Publisher

ISCA
DOI:

10.21437/interspeech.2019-2778
Funders:

Engineering and Physical Sciences Research Council

http://researchrepository.napier.ac.uk/output/3585891 <p>Zhang, S., Loweimi, E., Xu, Y., Bell, P., & Renals, S. (2019). Trainable Dynamic Subsampling for End-to-End Speech Recognition. In <i>Proc. Interspeech 2019</i> (1413-1417). https://doi.org/10.21437/interspeech.2019-2778</p>

Citation

Zhang, S., Loweimi, E., Xu, Y., Bell, P., & Renals, S. (2019). Trainable Dynamic Subsampling for End-to-End Speech Recognition. In Proc. Interspeech 2019 (1413-1417). https://doi.org/10.21437/interspeech.2019-2778

Authors

Dr Erfan Loweimi

School of Computing Engineering and the Built Environment

Monthly Views:

Available Documents

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy
Downloadable citations
HTML BIB RTF

Date:

Publication Status:

Publisher

DOI:

Funders:

Citation

Authors

Dr Erfan Loweimi

Monthly Views:

Files currently unavailable for download , please contact E.Loweimi@napier.ac.uk to request a copy

Downloadable citations