TY - GEN
T1 - Bangla grapheme to phoneme conversion using conditional random fields
AU - Chowdhury, Shammur Absar
AU - Alam, Firoj
AU - Khan, Naira
AU - Noori, Sheak R.H.
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/7/2
Y1 - 2017/7/2
N2 - Integrated with handheld devices, toys, KIOSKs, and call centers, Text to Speech (TTS) and Speech Recognition (SR) have become widely used applications in everyday life. One of the core components of said applications is Grapheme to Phoneme (G2P) conversion. The task at hand is the mapping of the written form to the spoken form, i.e. mapping one sequence to another. In Natural Language Processing (NLP), it is typically referred to as a sequence to sequence labeling task. The task however, is a language dependent one and has primarily been implemented for English and similar resource-rich languages. In comparison, very little has been done for digitally under-resourced languages such as Bangla (ethnonym: Bangla; exonym: Bengali). The current state-of-the-art Bangla Grapheme to Phoneme conversion is limited to rule-based and lexicon based approaches, the development of which requires a significant contribution of linguistic experts. In this paper, we propose a data-driven machine learning approach for Bangla G2P conversion. We evaluate the existing rule based approaches and design a machine learning model using Conditional Ran-dom Fields (CRFs). To train the machine learning models we have only used character level contextual features due to the fact that extracting hand crafted features requires specialized knowledge. We have evaluated the systems using two publicly available datasets. We have obtained promising results with a phoneme error rate of 1.51% and 14.88% for CRBLP and Google pronunciation lexicons, respectively.
AB - Integrated with handheld devices, toys, KIOSKs, and call centers, Text to Speech (TTS) and Speech Recognition (SR) have become widely used applications in everyday life. One of the core components of said applications is Grapheme to Phoneme (G2P) conversion. The task at hand is the mapping of the written form to the spoken form, i.e. mapping one sequence to another. In Natural Language Processing (NLP), it is typically referred to as a sequence to sequence labeling task. The task however, is a language dependent one and has primarily been implemented for English and similar resource-rich languages. In comparison, very little has been done for digitally under-resourced languages such as Bangla (ethnonym: Bangla; exonym: Bengali). The current state-of-the-art Bangla Grapheme to Phoneme conversion is limited to rule-based and lexicon based approaches, the development of which requires a significant contribution of linguistic experts. In this paper, we propose a data-driven machine learning approach for Bangla G2P conversion. We evaluate the existing rule based approaches and design a machine learning model using Conditional Ran-dom Fields (CRFs). To train the machine learning models we have only used character level contextual features due to the fact that extracting hand crafted features requires specialized knowledge. We have evaluated the systems using two publicly available datasets. We have obtained promising results with a phoneme error rate of 1.51% and 14.88% for CRBLP and Google pronunciation lexicons, respectively.
KW - Bangla
KW - Conditional Random Fields
KW - Grapheme to Phoneme (G2P)
KW - Pronunciation Generation
UR - http://www.scopus.com/inward/record.url?scp=85050392770&partnerID=8YFLogxK
U2 - 10.1109/ICCITECHN.2017.8281780
DO - 10.1109/ICCITECHN.2017.8281780
M3 - Conference contribution
AN - SCOPUS:85050392770
T3 - 20th International Conference of Computer and Information Technology, ICCIT 2017
SP - 1
EP - 6
BT - 20th International Conference of Computer and Information Technology, ICCIT 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 20th International Conference of Computer and Information Technology, ICCIT 2017
Y2 - 22 December 2017 through 24 December 2017
ER -