Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak, Ahmed Ali, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

5 Citations (Scopus)

Abstract

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human evaluation of the generated text using EC suggests that more than 80% is of adequate quality.

Original languageEnglish
Title of host publication2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages777-784
Number of pages8
ISBN (Electronic)9798350396904
DOIs
Publication statusPublished - 2023
Event2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Doha, Qatar
Duration: 9 Jan 202312 Jan 2023

Publication series

Name2022 IEEE Spoken Language Technology Workshop, SLT 2022 - Proceedings

Conference

Conference2022 IEEE Spoken Language Technology Workshop, SLT 2022
Country/TerritoryQatar
CityDoha
Period9/01/2312/01/23

Keywords

  • Code-switching
  • data augmentation
  • multilingual
  • speech recognition

Fingerprint

Dive into the research topics of 'Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition'. Together they form a unique fingerprint.

Cite this