Towards multilingual audio-visual speech enhancement in real noisy environments
  Speech enhancement aims to improve the overall quality and intelligibility of speech degraded by noise sources in real-world noisy environments. In recent years, researchers have proposed audio-visual speech enhancement models that go beyond traditional audio-only processing to provide better noise suppression and speech restoration in low SNR environments where multiple competing background noise sources are present. However, the audio-visual speech enhancement methods are language dependent as they exploit the correlations between visemes and the uttered speech. In addition, it has been shown that speaker pose variation significantly degrades the performance of these models.
This project aims to address the aforementioned two critical challenges with the current audio-visual speech enhancement models. The following research objectives will contribute to this development.

1. To design a novel multilingual audio-visual (AV) speech enhancement framework exploiting advanced machine learning techniques to address
2. To develop a novel multiview AV speech enhancement framework exploiting image translation and pose-invariant features.
3. Finally, we will integrate the two frameworks and critically evaluate the robustness and generalisation of the framework in a range of real world environments (e.g. cafeteria and restaurant) and use cases (e.g. car).

  • Start Date:

    17 February 2023

  • End Date:

    16 February 2025

  • Activity Type:

    Externally Funded Research

  • Funder:

    Royal Society

  • Value:


Project Team