This project aims to create a framework for Natural Language generation that combines common-sense knowledge and multi-modal information, enabling effective real-time communication between humans and artificial agents.

Funder: EPSRC

One of the most compelling problems in Artificial Intelligence is to create computational agents capable of interacting in real-world environments using natural language. Communication through language is the most vital and natural way of interaction. Humans are able to effectively communicate with each other using natural language, utilising common-sense knowledge and by making inferences about other people's backgrounds based on previous interactions with them. At the same time, they can successfully describe their surroundings, even when encountering unknown entities and object. For decades, researchers have tried to recreate the way humans communicate through natural language and although there are major breakthroughs during recent years (such as Apple's Siri or Amazon's Alexa), Natural Language Generation systems still lack the ability to reason, exploit common-sense knowledge, and utilise multi-modal information from a variety of sources such as knowledge bases, images, and videos.

 This project aims to develop a framework for common-sense- and visually- enhanced Natural Language Generation that can enable natural real-time communication between humans and artificial agents such as robots to enable effective collaboration between humans and robots. Human-Robot Interaction poses additional challenges to Natural Language Generation due to uncertainty derived from the dynamic environments and the non-deterministic fashion of interaction. The project aims to investigate methods for linking various modalities, taking into account their dynamic nature. To achieve natural, efficient and intuitive communication capabilities, agents will also need to acquire human-like abilities in synthesising knowledge and expression. The conditions under which external knowledge bases (such as Wikipedia) can be used to enhance natural language generation still have to be explored as well as whether existing knowledge bases are useful for language generation. The novel ways to integrate multi-modal data for language generation will lead to more robust and efficient interactions and will have an impact on natural language generation, social robotics, computer vision, and related fields. This might, in turn, spawn entirely novel applications, such as explaining exact procedures for e-health treatments and enhance tutoring systems for educational purposes.

The project outcomes include impactful publications, for details see: