ARADICE: Benchmarks for Dialectal and Cultural Capabilities in LLMs

Basel Mousi, Nadir Durrani, Fatema Ahmad, Md Arid Hasan, Maram Hasanain, Tameem Kabbani*, Fahim Dalvi, Shammur Absar Chowdhury, Firoj Alam

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Arabic, with its rich diversity of dialects, remains significantly underrepresented in Large Language Models, particularly in dialectal variations. We address this gap by introducing seven synthetic datasets in dialects alongside Modern Standard Arabic (MSA), created using Machine Translation (MT) combined with human post-editing. We present AraDiCE, a benchmark for Arabic Dialect and Cultural Evaluation. We evaluate LLMs on dialect comprehension and generation, focusing specifically on low-resource Arabic dialects. Additionally, we introduce the first-ever fine-grained benchmark designed to evaluate cultural awareness across the Gulf, Egypt, and Levant regions, providing a novel dimension to LLM evaluation. Our findings demonstrate that while Arabic-specific models like Jais and AceGPT outperform multilingual models on dialectal tasks, significant challenges persist in dialect identification, generation, and translation. This work contributes ≈45K post-edited samples, a cultural benchmark, and highlights the importance of tailored training to improve LLM performance in capturing the nuances of diverse Arabic dialects and cultural contexts. We have released the dialectal translation models and benchmarks developed in this study.

Original languageEnglish
Title of host publicationMain Conference
EditorsOwen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert
PublisherAssociation for Computational Linguistics (ACL)
Pages4186-4218
Number of pages33
ISBN (Electronic)9798891761964
Publication statusPublished - 2025
Event31st International Conference on Computational Linguistics, COLING 2025 - Abu Dhabi, United Arab Emirates
Duration: 19 Jan 202524 Jan 2025

Publication series

NameProceedings - International Conference on Computational Linguistics, COLING
VolumePart F206484-1
ISSN (Print)2951-2093

Conference

Conference31st International Conference on Computational Linguistics, COLING 2025
Country/TerritoryUnited Arab Emirates
CityAbu Dhabi
Period19/01/2524/01/25

Fingerprint

Dive into the research topics of 'ARADICE: Benchmarks for Dialectal and Cultural Capabilities in LLMs'. Together they form a unique fingerprint.

Cite this