Arabic Offensive Language on Twitter: Analysis and Experiments

Hamdy Mubarak, Ammar Rashed, Kareem Darwish, Younes Samih, Ahmed Abdelali

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

31 Citations (Scopus)

Abstract

Detecting offensive language on Twitter has many applications ranging from detecting/predicting bullying to measuring polarization. In this paper, we focus on building a large Arabic offensive tweet dataset. We introduce a method for building a dataset that is not biased by topic, dialect, or target. We produce the largest Arabic dataset to date with special tags for vulgarity and hate speech. We thoroughly analyze the dataset to determine which topics, dialects, and gender are most associated with offensive tweets and how Arabic speakers use offensive language. Lastly, we conduct many experiments to produce strong results (F1 = 83.2) on the dataset using SOTA techniques.

Original languageEnglish
Title of host publicationWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop
EditorsNizar Habash, Houda Bouamor, Hazem Hajj, Walid Magdy, Wajdi Zaghouani, Fethi Bougares, Nadi Tomeh, Ibrahim Abu Farha, Samia Touileb
PublisherAssociation for Computational Linguistics (ACL)
Pages126-135
Number of pages10
ISBN (Electronic)9781954085091
Publication statusPublished - 2021
Event6th Arabic Natural Language Processing Workshop, WANLP 2021 - Virtual, Kyiv, Ukraine
Duration: 19 Apr 2021 → …

Publication series

NameWANLP 2021 - 6th Arabic Natural Language Processing Workshop, Proceedings of the Workshop

Conference

Conference6th Arabic Natural Language Processing Workshop, WANLP 2021
Country/TerritoryUkraine
CityVirtual, Kyiv
Period19/04/21 → …

Fingerprint

Dive into the research topics of 'Arabic Offensive Language on Twitter: Analysis and Experiments'. Together they form a unique fingerprint.

Cite this