Detecting automatically-generated Arabic tweets

Hind Almerekhi*, Tamer Elsayed

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

17 Citations (Scopus)

Abstract

Recently, Twitter, one of the most widely-known social media platforms, got infiltrated by several automation programs, commonly known as “bots”. Bots can be easily abused to spread spam and hinder information extraction applications by posting lots of automatically-generated tweets that occupy a good portion of the continuous stream of tweets. This problem heavily affects users in the Arab region due to the recent developing political events as automated tweets can disturb communication and waste time needed in filtering such tweets. To mitigate this problem, this research work addresses the classification of Arabic tweets into automated or manual. We proposed four categories of features including formality, structural, tweet-specific, and temporal features. Our experimental evaluation over about 3.5 k randomly sampled Arabic tweets shows that classification based on individual categories of features outperform the baseline unigram-based classifier in terms of classification accuracy. Additionally, combining tweet-specific and unigram features improved classification accuracy to 92%, which is a significant improvement over the baseline classifier, constituting a very strong reference baseline for future studies.

Original languageEnglish
Title of host publicationInformation Retrieval Technology - 11th Asia Information Retrieval Societies Conference, AIRS 2015, Proceedings
EditorsFalk Scholer, Guido Zuccon, Shlomo Geva, Aixin Sun, Hideo Joho, Peng Zhang
PublisherSpringer Verlag
Pages123-134
Number of pages12
ISBN (Print)9783319289397
DOIs
Publication statusPublished - 2015
Event11th Asia Information Retrieval Societies Conference, AIRS 2015 - Brisbane, Australia
Duration: 2 Dec 20154 Dec 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9460
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th Asia Information Retrieval Societies Conference, AIRS 2015
Country/TerritoryAustralia
CityBrisbane
Period2/12/154/12/15

Keywords

  • Arabic microblogs
  • Automated tweets
  • Bots
  • Crowdsourcing
  • Tweet classification

Fingerprint

Dive into the research topics of 'Detecting automatically-generated Arabic tweets'. Together they form a unique fingerprint.

Cite this