A Neural Architecture for Dialectal Arabic Segmentation

Younes Samih, Mohammed Attia, Mohamed Eldesouki, Hamdy Mubarak, Ahmed Abdelali, Laura Kallmeyer, Kareem Darwish

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

25 Citations (Scopus)

Abstract

The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any normalization or reliance on lexical features or linguistic resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that heavily depend on additional resources.

Original languageEnglish
Title of host publicationWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop
PublisherAssociation for Computational Linguistics (ACL)
Pages46-54
Number of pages9
ISBN (Electronic)9781945626449
Publication statusPublished - 2017
Event3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017 - Valencia, Spain
Duration: 3 Apr 2017 → …

Publication series

NameWANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Conference

Conference3rd Arabic Natural Language Processing Workshop, WANLP 2017 held at EACL 2017
Country/TerritorySpain
CityValencia
Period3/04/17 → …

Fingerprint

Dive into the research topics of 'A Neural Architecture for Dialectal Arabic Segmentation'. Together they form a unique fingerprint.

Cite this