Human Language Technology

Collaborative AI

Spoken dialogue is the most natural means of human communication. It has also become part of human-machine interface in smartphones, personal assistants, intelligent agents, robot companions, in-car telematics, among others, that offers invaluable services. In practice, a dialogue system can be goal-driven, such as an ATM machine for people to complete a transaction; it can also be a chatting system, such as ELIZA chatbot for entertainment purposes without a specific goal; and it can also be something in-between that provides humans with information. Most of today’s dialogue systems are built on some existing knowledge database, and perform pattern classification tasks in one of the above three operating modes. They work in two separate phases: learning and run-time execution. At run-time, machines execute what they have learnt during training.

In this project, we develop novel methods for natural language dialogue with humans that will allow the a robotic system to proactively elicit information at run-time from a human co-worker on task details, and co-ordinate with the co-worker on sub-task allocation during planning. In particular, the system may use dialogue interaction to understand the co-worker’s goals, intentions and other aspects of the collaborative task which cannot be explicitly perceived through other means (e.g. visual). The project also studies the methods for the system to explain its responses using general and domain knowledge, as well as contextual information. Specifically, by combining low-level learning and high-level reasoning, we aim to enable machines to provide context-aware, user-centric explanations in response to human inquiries and to converse with humans more naturally to form a peer-like relationship. In this way, machines will perform services (e.g. inspection, repair, etc.) more accurately, while humans can be more confident in machines and make informed decisions.

The project will deliver a dialogue system that initially learns from sample, generic conversations, and further learn to generate domain-specific conversations using limited samples from the domain. The system also consolidates the contextual information such as working environment, user preferences and so on, and integrates such information with commonsense knowledge, and performs reasoning during the conversations to produce only relevant and concise responses, together with necessary explanations for certain types of inquiries.

collabAI

Project Duration: 26 November 2018 – 25 November 2023

Funding Source: RIE2020 Advanced Manufacturing and Engineering Programmatic Grant A18A2b0046

Acknowledgement: This research work is supported by Programmatic Grant No. A18A2b0046 from the Singapore Government’s Research, Innovation and Enterprise 2020 plan (Advanced Manufacturing and Engineering domain). Project Title: Human Robot Collaborative AI for AME.

PUBLICATIONS

Conference Articles

  • Grandee Lee and Haizhou Li, “Modeling Code-Switch Languages Using Bilingual Parallel Corpus”, in Association for Computational Linguistics, Long Paper, July 2020.
  • Grandee Lee, Xianghu Yue, Haizhou Li, “Linguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3730-3734. [link]
  • Xianghu Yue, Grandee Lee, Emre Yılmaz, Fang Deng and Haizhou Li, “End-to-End Code-Switching ASR for Low-Resourced Language Pairs”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019. [link]
  • Berrak Sisman and Haizhou Li, “Generative Adversarial Networks for Singing Voice Conversion with and without Parallel Data” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020. [link]
  • Kun Zhou, Berrak Sisman and Haizhou Li, “Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020. [link]
  • Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao and Haizhou Li, “Tacotron-based TTS with Joint Time-Frequency Domain Loss” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020. [link]
  • Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, “Personalized Singing Voice Generation Using WaveRNN” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020. [link]
  • Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “On the Importance of Audio-source Separation for Singer Identification in Polyphonic Music”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 2020-2024. [link]
  • Bidisha Sharma, Rohan Kumar Das and Haizhou Li, “Multi-level Adaptive Speech Activity Detector for Speech in Naturalistic Environments”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 2015-2019. [link]
  • Yi Zhou, Xiaohai Tian, Emre Yılmaz, Rohan Kumar Das and Haizhou Li, “A Modularized Neural Network with Language-Specific Output Layers for Cross-Lingual Voice Conversion”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019. [link]
  • Emre Yılmaz, Samuel Cohen, Xianghu Yue, David van Leeuwen and Haizhou Li, “Multi-Graph Decoding for Code-Switching ASR”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3750-3754. [link]
  • Qinyi Wang, Emre Yılmaz, Adem Derinel and Haizhou Li, “Code-Switching Detection Using ASR-Generated Language Posteriors”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3740-3744. [link]