Learning from relatives: Unified dialectal Arabic segmentation

Younes Samih, Mohamed Eldesouki, Mohammed Attia, Kareem Darwish, Ahmed Abdelali, Hamdy Mubarak, Laura Kallmeyer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

19 Citations (Scopus)

Abstract

Arabic dialects do not just share a common koiné, but there are shared pan-dialectal linguistic phenomena that allow computational models for dialects to learn from each other. In this paper we build a unified segmentation model where the training data for different dialects are combined and a single model is trained. The model yields higher accuracies than dialect-specific models, eliminating the need for dialect identification before segmentation. We also measure the degree of relatedness between four major Arabic dialects by testing how a segmentation model trained on one dialect performs on the other dialects. We found that linguistic relatedness is contingent with geographical proximity. In our experiments we use SVM-based ranking and bi-LSTM-CRF sequence labeling.

Original languageEnglish
Title of host publicationCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings
PublisherAssociation for Computational Linguistics (ACL)
Pages432-441
Number of pages10
ISBN (Electronic)9781945626548
DOIs
Publication statusPublished - 2017
Event21st Conference on Computational Natural Language Learning, CoNLL 2017 - Vancouver, Canada
Duration: 3 Aug 20174 Aug 2017

Publication series

NameCoNLL 2017 - 21st Conference on Computational Natural Language Learning, Proceedings

Conference

Conference21st Conference on Computational Natural Language Learning, CoNLL 2017
Country/TerritoryCanada
CityVancouver
Period3/08/174/08/17

Fingerprint

Dive into the research topics of 'Learning from relatives: Unified dialectal Arabic segmentation'. Together they form a unique fingerprint.

Cite this