Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet

Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Citations (Scopus)

Abstract

This paper focuses on comparing between using Support Vector Machine based ranking (SVMRank) and Bidirectional Long-Short-Term-Memory (bi-LSTM) neural-network based sequence labeling in building a state-of-the-art Arabic part-of-speech tagging system. Using SVMRank leads to state-of-the-art results, but with a fair amount of feature engineering. Using bi-LSTM, particularly when combined with word embeddings, may lead to competitive POS-tagging results by automatically deducing latent linguistic features. However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVMRankbased tagger yields to further improvements. We also show that gains realized using embeddings may not be additive with the gains achieved due to features. We are open-sourcing both the SVMRank and the bi-LSTM based systems for the research community.

Original languageEnglish
Title of host publicationWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages130-137
Number of pages8
ISBN (Electronic)9781945626449
Publication statusPublished - 2017
Event3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017 - Valencia, Spain
Duration: 3 Apr 2017 → …

Publication series

NameWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Conference

Conference3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017
Country/TerritorySpain
CityValencia
Period3/04/17 → …

Fingerprint

Dive into the research topics of 'Arabic POS Tagging: Don’t Abandon Feature Engineering Just Yet'. Together they form a unique fingerprint.

Cite this