Authorship attribution in Greek tweets using author's multilevel N-gram profiles

George K. Mikros, Kostas A. Perifanos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

40 Citations (Scopus)

Abstract

The aim of this study is to explore authorship attribution methods in Greek tweets. We have developed the first Modern Greek Twitter corpus (GTC) consisted of 12,973 tweets crawled from 10 Greek popular users. We used this corpus in order to study the effectiveness of a specific document representation called Author's Multilevel N-gram Profile (AMNP) and the impact of different methods on training data construction for the task of authorship attribution. In order to address the above research questions we used GTC to create 4 different datasets which contained merged tweets in texts of different sizes (100, 75, 50 and 25 words). Results were evaluated using authorship attribution accuracy both in 10-fold cross-validation and in an external test set compiled from actual tweets. AMNP representation achieved significant better accuracies than single feature groups across all text sizes.

Original languageEnglish
Title of host publicationAnalyzing Microtext - Papers from the AAAI Spring Symposium, Technical Report
Pages17-23
Number of pages7
Publication statusPublished - 2013
Externally publishedYes
Event2013 AAAI Spring Symposium - Palo Alto, CA, United States
Duration: 25 Mar 201327 Mar 2013

Publication series

NameAAAI Spring Symposium - Technical Report
VolumeSS-13-01

Conference

Conference2013 AAAI Spring Symposium
Country/TerritoryUnited States
CityPalo Alto, CA
Period25/03/1327/03/13

Fingerprint

Dive into the research topics of 'Authorship attribution in Greek tweets using author's multilevel N-gram profiles'. Together they form a unique fingerprint.

Cite this