TY - GEN
T1 - Improving egyptian-to-English SMT by mapping egyptian into MSA
AU - Durrani, Nadir
AU - Al-Onaizan, Yaser
AU - Ittycheriah, Abraham
PY - 2014
Y1 - 2014
N2 - One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.
AB - One of the aims of DARPA BOLT project is to translate the Egyptian blog data into English. While the parallel data for MSA-English is abundantly available, sparsely exists for Egyptian-English and Egyptian-MSA. A notable drop in the translation quality is observed when translating Egyptian to English in comparison with translating from MSA to English. One of the reasons for this drop is the high OOV rate, where as another is the dialectal differences between training and test data. This work is focused on improving Egyptian-to-English translation by bridging the gap between Egyptian and MSA. First we try to reduce the OOV rate by proposing MSA candidates for the unknown Egyptian words through different methods such as spelling correction, suggesting synonyms based on context etc. Secondly we apply convolution model using English as a pivot to map Egyptian words into MSA. We then evaluate our edits by running decoder built on MSA-to-English data. Our spelling-based correction shows an improvement of 1.7 BLEU points over the baseline system, that translates unedited Egyptian into English.
UR - http://www.scopus.com/inward/record.url?scp=84958552279&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-54903-8_23
DO - 10.1007/978-3-642-54903-8_23
M3 - Conference contribution
AN - SCOPUS:84958552279
SN - 9783642549021
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 271
EP - 282
BT - Computational Linguistics and Intelligent Text Processing - 15th International Conference, CICLing 2014, Proceedings
PB - Springer Verlag
T2 - 15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014
Y2 - 6 April 2014 through 12 April 2014
ER -