Automatic speech recognition of Arabic multi-genre broadcast media

Maryam Najafian, Wei Ning Hsu, Ahmed Ali, James Glass

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

18 Citations (Scopus)

Abstract

This paper describes an Arabic Automatic Speech Recognition system developed on 15 hours of Multi-Genre Broadcast (MGB-3) data from YouTube, plus 1,200 hours of Multi-Dialect and Multi-Genre MGB-2 data recorded from the Aljazeera Arabic TV channel. In this paper, we report our investigations of a range of signal pre-processing, data augmentation, topic-specific language model adaptation, accent specific re-training, and deep learning based acoustic modeling topologies, such as feed-forward Deep Neural Networks (DNNs), Time-delay Neural Networks (TDNNs), Long Short-term Memory (LSTM) networks, Bidirectional LSTMs (BLSTMs), and a Bidirectional version of the Prioritized Grid LSTM (BPGLSTM) model. We propose a system combination for three purely sequence trained recognition systems based on lattice-free maximum mutual information, 4-gram language model re-scoring, and system combination using the minimum Bayes risk decoding criterion. The best word error rate we obtained on the MGB-3 Arabic development set using a 4-gram re-scoring strategy is 42.25% for a chain BLSTM system, compared to 65.44% baseline for a DNN system.

Original languageEnglish
Title of host publication2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages353-359
Number of pages7
ISBN (Electronic)9781509047888
DOIs
Publication statusPublished - 2 Jul 2017
Event2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Okinawa, Japan
Duration: 16 Dec 201720 Dec 2017

Publication series

Name2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings
Volume2018-January

Conference

Conference2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017
Country/TerritoryJapan
CityOkinawa
Period16/12/1720/12/17

Keywords

  • Acoustic mis-match
  • RNNs
  • Speech recognition
  • multi-dialect
  • multi-genre

Fingerprint

Dive into the research topics of 'Automatic speech recognition of Arabic multi-genre broadcast media'. Together they form a unique fingerprint.

Cite this