Abusive language detection on Arabic social media

Hamdy Mubarak, Kareem Darwish, Walid Magdy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

243 Citations (Scopus)

Abstract

In this paper, we present our work on detecting abusive language on Arabic social media. We extract a list of obscene words and hashtags using common patterns used in offensive and rude communications. We also classify Twitter users according to whether they use any of these words or not in their tweets. We expand the list of obscene words using this classification, and we report results on a newly created dataset of classified Arabic tweets (obscene, offensive, and clean). We make this dataset freely available for research, in addition to the list of obscene words and hashtags. We are also publicly releasing a large corpus of classified user comments that were deleted from a popular Arabic news site due to violations the site's rules and guidelines.

Original languageEnglish
Title of host publication1st Workshop on Abusive Language Online, ALW 2017 at the 55th Annual Meeting of the Association for Computational Linguistic, ACL 2017 - Proceedings of the Workshop
EditorsZeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, Joel Tetreault
PublisherAssociation for Computational Linguistics (ACL)
Pages52-56
Number of pages5
ISBN (Electronic)9781945626661
Publication statusPublished - 2017
Event1st Workshop on Abusive Language Online, ALW 2017 at the 55th Annual Meeting of the Association for Computational Linguistic, ACL 2017 - Proceedings of the Workshop - Vancouver, Canada
Duration: 4 Aug 2017 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference1st Workshop on Abusive Language Online, ALW 2017 at the 55th Annual Meeting of the Association for Computational Linguistic, ACL 2017 - Proceedings of the Workshop
Country/TerritoryCanada
CityVancouver
Period4/08/17 → …

Fingerprint

Dive into the research topics of 'Abusive language detection on Arabic social media'. Together they form a unique fingerprint.

Cite this