Abstract
The aim of this study is to explore authorship attribution methods in Greek tweets. We have developed the first Modern Greek Twitter corpus (GTC) consisted of 12,973 tweets crawled from 10 Greek popular users. We used this corpus in order to study the effectiveness of a specific document representation called Author’s Multilevel N-gram Profile (AMNP)and the impact of different methods on training data construction for the task of authorship attribution. In order to address the above research questions we used GTC to create 4 different datasets which contained merged tweets in texts of differ-ent sizes (100, 75, 50 and 25 words). Results were evaluated using authorship attribution accuracy both in 10-fold cross-validation and in an external test set compiled from actual tweets. AMNP representation achieved significant better ac-curacies than single feature groups across all text sizes
Original language | English |
---|---|
Title of host publication | Papers from the 2013 AAAI Spring Symposium |
Publication status | Published - 2013 |
Externally published | Yes |