TY - GEN
T1 - Effects of dialectal code-switching on speech modules
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
AU - Chowdhury, Shammur A.
AU - Samih, Younes
AU - Eldesouki, Mohamed
AU - Ali, Ahmed
N1 - Publisher Copyright:
© 2020 ISCA
PY - 2020
Y1 - 2020
N2 - The intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic. For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus. Our results highlight the importance of lexical information for discriminating the DCS labels. We observe that the performance of different models is highly dependent on the degree of code-mixing at the token-level as well as its complexity at the utterance-level.
AB - The intra-utterance code-switching (CS) is defined as the alternation between two or more languages within the same utterance. Despite the fact that spoken dialectal code-switching (DCS) is more challenging than CS, it remains largely unexplored. In this study, we describe a method to build the first spoken DCS corpus. The corpus is annotated at the token-level minding both linguistic and acoustic cues for dialectal Arabic. For detailed analysis, we study Arabic automatic speech recognition (ASR), Arabic dialect identification (ADI), and natural language processing (NLP) modules for the DCS corpus. Our results highlight the importance of lexical information for discriminating the DCS labels. We observe that the performance of different models is highly dependent on the degree of code-mixing at the token-level as well as its complexity at the utterance-level.
KW - Code mixing index
KW - Code-switching
KW - Corpus
KW - Dialect identification
UR - http://www.scopus.com/inward/record.url?scp=85098103187&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2020-2271
DO - 10.21437/Interspeech.2020-2271
M3 - Conference contribution
AN - SCOPUS:85098103187
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 2382
EP - 2386
BT - Interspeech 2020
PB - International Speech Communication Association
Y2 - 25 October 2020 through 29 October 2020
ER -