Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus

Hassan Sajjad, Nadir Durrani, Helmut Schmid, Alexander Fraser

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

7 Citations (Scopus)

Abstract

We compare the use of an unsupervised transliteration mining method and a rule-based method to automatically extract lists of transliteration word pairs from a parallel corpus of Hindi/Urdu. We build joint source channel models on the automatically aligned orthographic transliteration units of the automatically extracted lists of transliteration pairs resulting in two transliteration systems. We compare our systems with three transliteration systems available on the web, and show that our systems have better performance. We perform an extensive analysis of the results of using both methods and show evidence that the unsupervised transliteration mining method is superior for applications requiring high recall transliteration lists, while the rule-based method is useful for obtaining high precision lists.

Original languageEnglish
Title of host publicationIJCNLP 2011 - Proceedings of the 5th International Joint Conference on Natural Language Processing
EditorsHaifeng Wang, David Yarowsky
PublisherAssociation for Computational Linguistics (ACL)
Pages129-137
Number of pages9
ISBN (Electronic)9789744665645
Publication statusPublished - 2011
Externally publishedYes
Event5th International Joint Conference on Natural Language Processing, IJCNLP 2011 - Chiang Mai, Thailand
Duration: 8 Nov 201113 Nov 2011

Publication series

NameIJCNLP 2011 - Proceedings of the 5th International Joint Conference on Natural Language Processing

Conference

Conference5th International Joint Conference on Natural Language Processing, IJCNLP 2011
Country/TerritoryThailand
CityChiang Mai
Period8/11/1113/11/11

Fingerprint

Dive into the research topics of 'Comparing Two Techniques for Learning Transliteration Models Using a Parallel Corpus'. Together they form a unique fingerprint.

Cite this