Abstract
The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN'11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVM machine learning methods performed well above average in authorship attribution subtasks and below average in authorship verification subtasks.
Original language | English |
---|---|
Journal | CEUR Workshop Proceedings |
Volume | 1177 |
Publication status | Published - 2011 |
Externally published | Yes |
Event | 2011 Cross Language Evaluation Forum Conference, CLEF 2011 - Amsterdam, Netherlands Duration: 19 Sept 2011 → 22 Sept 2011 |
Keywords
- Authorship attribution
- Authorship verification
- LIBLINEAR
- LIBSVM
- One-class SVM
- Regularized logistic regression
- Stylometry
- Support vector machines