Gender identification in Modern Greek tweets

Georgios Mikros, Kostas Perifanos

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

Abstract

The aim of this paper is to analyze tweets written in Modern Greek and develop a robust methodology for identifying the gender of their author. For this reason, we compare three different feature groups (most frequent function words, gender keywords, and Author Multilevel N-gram Profiles) using two differ-ent machine learning algorithms (Random Forests and Support Vector Machines) in various text sizes. The best result (0.883 accuracy) was obtained using SVMs trained with the AMNP feature group using 100-word tweet chunks. This method-ology can lead to reliable and accurate gender identification results using tweet chunk sizes as small as 50 words each.
Original languageEnglish
Title of host publicationRecent Contributions to Quantitative Linguistics
Publication statusPublished - 2015
Externally publishedYes

Fingerprint

Dive into the research topics of 'Gender identification in Modern Greek tweets'. Together they form a unique fingerprint.

Cite this