Abstract
The aim of this paper is to analyze tweets written in Modern Greek and develop a robust methodology for identifying the gender of their author. For this reason, we compare three different feature groups (most frequent function words, gender keywords, and Author Multilevel N-gram Profiles) using two differ-ent machine learning algorithms (Random Forests and Support Vector Machines) in various text sizes. The best result (0.883 accuracy) was obtained using SVMs trained with the AMNP feature group using 100-word tweet chunks. This method-ology can lead to reliable and accurate gender identification results using tweet chunk sizes as small as 50 words each.
Original language | English |
---|---|
Title of host publication | Recent Contributions to Quantitative Linguistics |
Publication status | Published - 2015 |
Externally published | Yes |