Model with Minimal Translation Units, but Decode with Phrases

Nadir Durrani*, Alexander Fraser, Helmut Schmid

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

N-gram-based models co-exist with their phrase-based counterparts as an alternative SMT framework. Both techniques have pros and cons. While the N-gram-based framework provides a better model that captures both source and target contexts and avoids spurious phrasal segmentation, the ability to memorize and produce larger translation units gives an edge to the phrase-based systems during decoding, in terms of better search performance and superior selection of translation units. In this paper we combine N-gram-based modeling with phrase-based decoding, and obtain the benefits of both approaches. Our experiments show that using this combination not only improves the search accuracy of the N-gram model but that it also improves the BLEU scores. Our system outperforms state-of-the-art phrase-based systems (Moses and Phrasal) and N-gram-based systems by a significant margin on German, French and Spanish to English translation tasks.

Original languageEnglish
Title of host publicationProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies, NAACL-HLT 2013
EditorsDavid Elson, Anna Kazantseva, Stan Szpakowicz
PublisherAssociation for Computational Linguistics (ACL)
Pages1-11
Number of pages11
ISBN (Electronic)9781937284473
Publication statusPublished - 2013
Externally publishedYes
Event2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013 - Atlanta, United States
Duration: 14 Jun 2013 → …

Publication series

NameProceedings of the 2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013

Conference

Conference2nd Workshop on Computational Linguistics for Literature, CLfL 2013 at the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2013
Country/TerritoryUnited States
CityAtlanta
Period14/06/13 → …

Fingerprint

Dive into the research topics of 'Model with Minimal Translation Units, but Decode with Phrases'. Together they form a unique fingerprint.

Cite this