Borderless azerbaijani processing: Linguistic resources and a transformer-based approach for azerbaijani transliteration

Reihaneh Zohrabi, Mostafa Masumi, Omid Ghahroodi, Parham AbedAzad, Hamid Beigy, Mohammad H. Rohban, Ehsaneddin Asgari

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recent advancements in neural language models have revolutionized natural language understanding. However, many languages still face the risk of being left behind without the benefits of such advancements, potentially leading to their extinction. One such language is Azerbaijani in Iran, which suffers from limited digital resources and a lack of alignment be- tween spoken and written forms. In contrast, Azerbaijani in the Republic of Azerbaijan has seen more resources and is not considered as low-resource as its Iranian counterpart. In this context, our research focuses on the computational progress made in Iranian Azerbaijani language. We propose a transliteration model that leverages an Azerbaijani parallel dataset, effectively bridging the gap between the Latin and Persian scripts. By enabling seamless communication between these two scripts, our model facilitates cultural exchange and serves as a valuable tool for transfer learning. The effectiveness of our approach surpasses traditional rule-based methods, as evidenced by the significant improvements in performance metrics. We observe a minimum 15% increase in BLEU scores and a reduction of at least 1/3 in edit distance. Furthermore, our model’s online demo is accessible at https://azeri.parsi.ai/.
Original languageEnglish
Title of host publicationProceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics
Publication statusPublished - 2023
Externally publishedYes

Fingerprint

Dive into the research topics of 'Borderless azerbaijani processing: Linguistic resources and a transformer-based approach for azerbaijani transliteration'. Together they form a unique fingerprint.

Cite this