Using stem-templates to improve Arabic pos and gender/number tagging

Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

30 Citations (Scopus)

Abstract

This paper presents an end-to-end automatic processing system for Arabic. The system performs: correction of common spelling errors pertaining to different forms of alef, ta marbouta and ha, and alef maqsoura and ya; context sensitive word segmentation into underlying clitics, POS tagging, and gender and number tagging of nouns and adjectives. We introduce the use of stem templates as a feature to improve POS tagging by 0.5% and to help ascertain the gender and number of nouns and adjectives. For gender and number tagging, we report accuracies that are significantly higher on previously unseen words compared to a state-of-the-art system.

Original languageEnglish
Title of host publicationProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014
EditorsNicoletta Calzolari, Khalid Choukri, Sara Goggi, Thierry Declerck, Joseph Mariani, Bente Maegaard, Asuncion Moreno, Jan Odijk, Helene Mazo, Stelios Piperidis, Hrafn Loftsson
PublisherEuropean Language Resources Association (ELRA)
Pages2926-2931
Number of pages6
ISBN (Electronic)9782951740884
Publication statusPublished - 2014
Event9th International Conference on Language Resources and Evaluation, LREC 2014 - Reykjavik, Iceland
Duration: 26 May 201431 May 2014

Publication series

NameProceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014

Conference

Conference9th International Conference on Language Resources and Evaluation, LREC 2014
Country/TerritoryIceland
CityReykjavik
Period26/05/1431/05/14

Keywords

  • Arabic
  • Denormalization
  • Part of Speech Tagging

Fingerprint

Dive into the research topics of 'Using stem-templates to improve Arabic pos and gender/number tagging'. Together they form a unique fingerprint.

Cite this