TY - GEN
T1 - Towards one model to rule all
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
AU - Chowdhury, Shammur Absar
AU - Hussein, Amir
AU - Abdelali, Ahmed
AU - Ali, Ahmed
N1 - Publisher Copyright:
Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using selfattention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. We evaluate the system performance handling: (i) monolingual (Ar, En and Fr); (ii) multi-dialectal (Modern Standard Arabic, along with dialectal variation such as Egyptian and Moroccan); (iii) code-switching - cross-lingual (Ar-En/Fr) and dialectal (MSAEgyptian dialect) test cases, and compare with current state-ofthe- art systems. Furthermore, we investigate the influence of different embedding/character representations including character vs word-piece; shared vs distinct input symbol per language. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
AB - With the advent of globalization, there is an increasing demand for multilingual automatic speech recognition (ASR), handling language and dialectal variation of spoken content. Recent studies show its efficacy over monolingual systems. In this study, we design a large multilingual end-to-end ASR using selfattention based conformer architecture. We trained the system using Arabic (Ar), English (En) and French (Fr) languages. We evaluate the system performance handling: (i) monolingual (Ar, En and Fr); (ii) multi-dialectal (Modern Standard Arabic, along with dialectal variation such as Egyptian and Moroccan); (iii) code-switching - cross-lingual (Ar-En/Fr) and dialectal (MSAEgyptian dialect) test cases, and compare with current state-ofthe- art systems. Furthermore, we investigate the influence of different embedding/character representations including character vs word-piece; shared vs distinct input symbol per language. Our findings demonstrate the strength of such a model by outperforming state-of-the-art monolingual dialectal Arabic and code-switching Arabic ASR.
KW - Code-switching
KW - Conformer
KW - E2e
KW - Multi-dialectal
KW - Multilingual
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85119280745&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2021-1809
DO - 10.21437/Interspeech.2021-1809
M3 - Conference contribution
AN - SCOPUS:85119280745
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 391
EP - 395
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PB - International Speech Communication Association
Y2 - 30 August 2021 through 3 September 2021
ER -