Authorship identification in large email collections: Experiments using features that belong to different linguistic levels Notebook for PAN at CLEF 2011

George K. Mikros, Kostas Perifanos

Research output: Contribution to journalConference articlepeer-review

1 Citation (Scopus)

Abstract

The aim of this paper is to explore the usefulness of using features from different linguistic levels to email authorship identification. Using various email datasets provided by PAN'11 lab we tested several feature groups in both authorship attribution and authorship verification subtasks. The selected feature groups combined with Regularized Logistic Regression and One-Class SVM machine learning methods performed well above average in authorship attribution subtasks and below average in authorship verification subtasks.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume1177
Publication statusPublished - 2011
Externally publishedYes
Event2011 Cross Language Evaluation Forum Conference, CLEF 2011 - Amsterdam, Netherlands
Duration: 19 Sept 201122 Sept 2011

Keywords

  • Authorship attribution
  • Authorship verification
  • LIBLINEAR
  • LIBSVM
  • One-class SVM
  • Regularized logistic regression
  • Stylometry
  • Support vector machines

Fingerprint

Dive into the research topics of 'Authorship identification in large email collections: Experiments using features that belong to different linguistic levels Notebook for PAN at CLEF 2011'. Together they form a unique fingerprint.

Cite this