Arabic Offensive Language Classification on Twitter

Hamdy Mubarak, Kareem Darwish*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

29 Citations (Scopus)

Abstract

Social media users often employ offensive language in their communication. Detecting offensive language on Twitter has many applications ranging from detecting/predicting conflict to measuring polarization. In this paper, we focus on building effective offensive tweet detection. We show that we can rapidly build a training set using a seed list of offensive words. Given the automatically created dataset, we trained a character n-gram based deep learning classifier that can effectively classify tweets with F1 score of 90%. We also show that we can expand our offensive word list by contrasting offensive and non-offensive tweets.

Original languageEnglish
Title of host publicationSocial Informatics - 11th International Conference, SocInfo 2019, Proceedings
EditorsIngmar Weber, Kareem M. Darwish, Claudia Wagner, Claudia Wagner, Fabian Flöck, Emilio Zagheni, Samin Aref, Laura Nelson
PublisherSpringer
Pages269-276
Number of pages8
ISBN (Print)9783030349707
DOIs
Publication statusPublished - 2019
Event11th International Conference on Social Informatics, SocInfo 2019 - Doha, Qatar
Duration: 18 Nov 201921 Nov 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11864 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Conference on Social Informatics, SocInfo 2019
Country/TerritoryQatar
CityDoha
Period18/11/1921/11/19

Keywords

  • Obscenities
  • Offensive language
  • Text classification

Fingerprint

Dive into the research topics of 'Arabic Offensive Language Classification on Twitter'. Together they form a unique fingerprint.

Cite this