Authorship attribution in Greek tweets using multilevel author’s n-gram profiles

Georgios Mikros, Kostas Perifanos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The aim of this study is to explore authorship attribution methods in Greek tweets. We have developed the first Modern Greek Twitter corpus (GTC) consisted of 12,973 tweets crawled from 10 Greek popular users. We used this corpus in order to study the effectiveness of a specific document representation called Author’s Multilevel N-gram Profile (AMNP)and the impact of different methods on training data construction for the task of authorship attribution. In order to address the above research questions we used GTC to create 4 different datasets which contained merged tweets in texts of differ-ent sizes (100, 75, 50 and 25 words). Results were evaluated using authorship attribution accuracy both in 10-fold cross-validation and in an external test set compiled from actual tweets. AMNP representation achieved significant better ac-curacies than single feature groups across all text sizes
Original languageEnglish
Title of host publicationPapers from the 2013 AAAI Spring Symposium
Publication statusPublished - 2013
Externally publishedYes

Fingerprint

Dive into the research topics of 'Authorship attribution in Greek tweets using multilevel author’s n-gram profiles'. Together they form a unique fingerprint.

Cite this