Research Output
Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling
  In this paper we investigate the usefulness of the sign spectrum and its combination with the raw magnitude spectrum in acoustic modelling for automatic speech recognition (ASR). The sign spectrum is a sequence of ±1s, capturing one bit of the phase spectrum. It encodes information overlooked by the magnitude spectrum enabling unique signal characterisation and reconstruction. In particular, we demonstrate it carries information related to the temporal structure of the signal as well as the speech’s source component. Furthermore, we investigate the usefulness of combining it with the raw magnitude spectrum via multi-head CNNs at different fusion levels for ASR. While information-wise these two streams of information are together equivalent to the raw waveform signal the overall performance is noticeably higher than raw waveform and classic features such as MFCC and filterbank. This has been observed and verified in TIMIT, NTIMT, Aurora-4 and WSJ tasks and up to 14.5% relative WER reduction has been achieved.

  • Date:

    25 October 2020

  • Publication Status:

    Published

  • Publisher

    ISCA

  • DOI:

    10.21437/interspeech.2020-18

  • Funders:

    Engineering and Physical Sciences Research Council

Citation

Loweimi, E., Bell, P., & Renals, S. (2020). Raw Sign and Magnitude Spectra for Multi-Head Acoustic Modelling. In Proc. Interspeech 2020 (1644-1648). https://doi.org/10.21437/interspeech.2020-18

Authors

Monthly Views:

Available Documents