UL2C: Mapping User Locations to Countries on Arabic Twitter

Hamdy Mubarak, Sabit Hassan

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

Mapping user locations to countries can be useful for many applications such as dialect identification, author profiling, recommendation systems, etc. Twitter allows users to declare their locations as free text, and these userdeclared locations are often noisy and hard to decipher automatically. In this paper, we present the largest manually labeled dataset for mapping user locations on Arabic Twitter to their corresponding countries. We build effective machine learning models that can automate this mapping with significantly better efficiency compared to libraries such as geopy. We also show that our dataset is more effective than data extracted from GeoNames geographical database in this task as the latter covers only locations written in formal ways.

Original languageEnglish
Title of host publicationWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop
EditorsNizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
PublisherAssociation for Computational Linguistics (ACL)
Pages145-153
Number of pages9
ISBN (Electronic)9781954085091
Publication statusPublished - 2021
Event6th Arabic Natural Language Processing Workshop, WANLP 2021 - Virtual, Kyiv, Ukraine
Duration: 19 Apr 2021 → …

Publication series

NameWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Conference

Conference6th Arabic Natural Language Processing Workshop, WANLP 2021
Country/TerritoryUkraine
CityVirtual, Kyiv
Period19/04/21 → …

Fingerprint

Dive into the research topics of 'UL2C: Mapping User Locations to Countries on Arabic Twitter'. Together they form a unique fingerprint.

Cite this