Human Language Technology

AI Speech Lab

AI Singapore (AISG) has set up an AI Speech Lab to develop a speech recognition system that could interpret and process the unique vocabulary used by Singaporeans – including Singlish and dialects – in daily conversations. This automatic speech transcribing system could be deployed at various government agencies and companies to assist frontline officers in acquiring relevant and actionable information while they focus on interacting with customers or service users to address their queries and concerns.

Established as part of our 100 Experiments (100E) Programme, this new speech lab marks AISG’s first major collaboration with multiple government agencies to design an AI system that could be deployed government-wide and in future, nation-wide. Discussions with companies to deploy the system are also underway. Located at the innovation 4.0 building within the National University of Singapore’s Kent Ridge campus, the lab is operational from 1 July 2018.

“The AI Speech Lab came about as we had, over the last few months, received multiple 100E requests from agencies and companies for a colloquial Singaporean English (Singlish) speech-to-text engine. This is a challenge that is unique to Singapore and the region which is currently not addressed by existing speech engines offered commercially or by major cloud-based AI providers,” said Professor Ho Teck Hua, Executive Chairman of AI Singapore.

“The Government is keen to harness artificial intelligence to serve our citizens better. GovTech is collaborating with AI Singapore to develop solutions that can improve planning and service delivery. We are working with the AI Speech Lab on a speech-to-text engine for multi-language speech, for example, to transcribe 995 calls on-the-fly for faster response,” said Mr Tan Kok Yam, Deputy Secretary (Smart Nation and Digital Government).

The new research lab is led by Professor Li Haizhou, a world-renowned expert in Speech, Text and Natural Language Processing from the National University of Singapore, and Associate Professor Chng Eng Siong from the Nanyang Technological University.

They recently developed the world’s first code-switch (mixed-lingual) speech recognition engine using deep learning technology. This technological breakthrough represents a paradigm shift in speech recognition and understanding. The AI Speech Lab will adopt this advanced speech recognition technology to benefit Singapore.

The novel code-switch speech recognition engine can recognise speech that comprises a mix of English and Chinese words in the same sentence, as if they belong to the same language. Furthermore, to adapt to the local context, words in dialects such as ‘jiak ba bueh’ or ‘hoh boh’ (have you eaten or how are you in Hokkien) are also included into the engine lexicon.

“An automatic speech recognition system that is able to recognise a mix of languages in one conversation is currently not commercially available. This is because training a computer system to recognise different languages is a very complex and challenging task. Our recent technological breakthrough is the outcome of several years of research efforts in Singapore. This technology performs better than commercial engines as it can accurately recognize conversations comprising words from different languages and solves a unique Singapore problem,” Prof Li explained.

AI Singapore is investing S$1.25 million to set up this new lab. Agencies and companies are expected to match this investment, bringing the total funding to S$2.5 million over the next three years. The lab, which occupies a floor area of 125 square metres, will be staffed by five AI engineers for a start.

Maiden collaboration with SCDF to manage emergency calls :
The AI Speech Lab has secured its first collaborator – the Singapore Civil Defence Force (SCDF).

“The Singapore Civil Defence Force’s 995 Operations Centre receives close to 200,000 calls for assistance every year. When a call is received, our dispatchers need to ask some questions to determine the nature and severity of the case, to facilitate the deployment of appropriate emergency medical resources. In an emergency, every minute counts. The new speech recognition system, if successful, will help reduce the time needed to log in the information. This will improve how SCDF’s emergency medical resources are dispatched and enhance the overall health outcomes of those in need.” remarked Assistant Commissioner Daniel Seet, SCDF Director of Operations.

“This collaboration between the AI Speech Lab and SCDF is an important first step. The knowledge and experience acquired through this project will enable AI Singapore to expand the deployment of this novel speech recognition engine to address other needs in the public service. Whilst we are first rolling out this system to the public service, we are confident that the solution will also benefit companies as the system can be customised according to their business needs. ” said Prof Ho.

Project Duration: 1 July 2018 – 30 June 2021

Funding Source: AI Singapore 100 Experiments Program


Acknowledgement: This research is supported by the National Research Foundation Singapore under its AI Singapore Programme (Award Number: AISG-100E-2018-006).


Journal Articles

  • Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li, “SpEx: Multi-Scale Time Domain Speaker Extraction Network”, IEEE/ACM Transaction on Audio, Speech, and Language Processing (2020). [link]
  • Chitralekha Gupta, Haizhou Li and Ye Wang, “Automatic Leaderboard: Evaluation of Singing Quality Without a Standard Reference,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, 28(1), December 2020, pp. 13-26. [link]

Conference Articles

  • Xiaoxue Gao, Xiaohai Tian, Yi Zhou, Rohan Kumar Das and Haizhou Li, “Personalized Singing Voice Generation Using WaveRNN” in Proc. Speaker Odyssey 2020, Tokyo, Japan, November 2020. [link]
  • Xianghu Yue, Grandee Lee, Emre Yılmaz, Fang Deng and Haizhou Li, “End-to-End Code-Switching ASR for Low-Resourced Language Pairs”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019. [link]
  • Qinyi Wang, Emre Yılmaz, Adem Derinel and Haizhou Li, “Code-Switching Detection Using ASR-Generated Language Posteriors”, in Proc. INTERSPEECH, Graz, Austria, September 2019, pp. 3740-3744. [link]
  • Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li, “Optimization of Speaker Extraction Neural network with Magnitude and Temporal Spectrum Approximation Loss”, In. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, United Kingdom, May 2019, pp. 6990-6994. [link]
  • Grandee Lee and Haizhou Li “Word and Class Common Space Embedding for Code-switch Language Modeling”, In. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brighton, United Kingdom, May 2019, pp. 6086-6090. [link]
  • Chenglin Xu, Wei Rao, Eng Siong Chng and Haizhou Li, “Time-Domain Speaker Extraction Network”, in Proc. IEEE Automatic Speech Recognition Understanding (ASRU) Workshop 2019, Sentosa Island, Singapore, December 2019. [link]