IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs

Reem Suwaileh, Tamer Elsayed, Muhammad Imran

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

Extracting and disambiguating geolocation information from social media data enables effective disaster management, as it helps response authorities; for example, locating incidents for planning rescue activities and affected people for evacuation. Nevertheless, the dearth of resources and tools hinders the development and evaluation of Location Mention Disambiguation (LMD) models in the disaster management domain. Consequently, the LMD task is greatly understudied, especially for the low resource languages such as Arabic. To fill this gap, we introduce IDRISI-D, the largest to date English and the first Arabic public LMD datasets. Additionally, we introduce a modified hierarchical evaluation framework that offers a lenient and nuanced evaluation of LMD systems. We further benchmark IDRISI-D datasets using representative baselines and show the competitiveness of BERT-based models.

Original languageEnglish
Title of host publicationArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Porceedings
EditorsHassan Sawaf, Samhaa El-Beltagy, Wajdi Zaghouani, Walid Magdy, Nadi Tomeh, Ibrahim Abu Farha, Nizar Habash, Salam Khalifa, Amr Keleg, Hatem Haddad, Imed Zitouni, Ahmed Abdelali, Khalil Mrini, Rawan Almatham
PublisherAssociation for Computational Linguistics (ACL)
Pages158-169
Number of pages12
ISBN (Electronic)9781959429272
DOIs
Publication statusPublished - 2023
Event1st Arabic Natural Language Processing Conference, ArabicNLP 2023 - Hybrid, Singapore, Singapore
Duration: 7 Dec 2023 → …

Publication series

NameArabicNLP 2023 - 1st Arabic Natural Language Processing Conference, Proceedings

Conference

Conference1st Arabic Natural Language Processing Conference, ArabicNLP 2023
Country/TerritorySingapore
CityHybrid, Singapore
Period7/12/23 → …

Fingerprint

Dive into the research topics of 'IDRISI-D: Arabic and English Datasets and Benchmarks for Location Mention Disambiguation over Disaster Microblogs'. Together they form a unique fingerprint.

Cite this