Research Output
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition
  Emotion recognition from speech plays a significant role in adding emotional intelligence to machines and making human-machine interaction more natural. One of the key challenges from machine learning standpoint is to extract patterns which bear maximum correlation with the emotion information encoded in this signal while being as insensitive as possible to other types of information carried by speech. In this paper, we propose a novel temporal modelling framework for robust emotion classification using bidirectional long short-term memory network (BLSTM), CNN and Capsule networks. The BLSTM deals with the temporal dynamics of the speech signal by effectively representing forward/backward contextual information while the CNN along with the dynamic routing of the Capsule net learn temporal clusters which altogether provide a state-of-the-art technique for classifying the extracted patterns. The proposed approach was compared with a wide range of architectures on the FAU-Aibo and RAVDESS corpora and remarkable gain over state-of-the-art systems were obtained. For FAO-Aibo and RAVDESS 77.6% and 56.2% accuracy was achieved, respectively, which is 3% and 14% (absolute) higher than the best-reported result for the respective tasks.

  • Date:

    15 September 2019

  • Publication Status:

    Published

  • Publisher

    ISCA

  • DOI:

    10.21437/interspeech.2019-3068

  • Funders:

    Engineering and Physical Sciences Research Council

Citation

Jalal, M. A., Loweimi, E., Moore, R. K., & Hain, T. (2019). Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition. In Proc. Interspeech 2019 (1701-1705). https://doi.org/10.21437/interspeech.2019-3068

Authors

Monthly Views:

Available Documents