TY - JOUR
T1 - AI-Writing Detection Using an Ensemble of Transformers and Stylometric Features
AU - Mikros, George
AU - Koursaris, Athanasios
AU - Bilianos, Dimitrios
AU - Markopoulos, George
N1 - Publisher Copyright:
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
PY - 2023
Y1 - 2023
N2 - This study aims to develop an effective and precise methodology for detecting AI-generated text, leveraging the synergistic combination of transformer learning and stylometric features. The research utilized two datasets provided by the AuTexTification: Automated Text Identification shared task, a component of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum held at the SEPLN 2023 Conference. Our team engaged in both English language subtasks, which included binary classification of texts as either human or AI-generated and multiclass classification to predict the specific AI writing model employed from a selection of six. Our main approach was to experiment with multiple Transformer models and, at the same time, to use an extensive stylometric feature engineering workflow. Each method (transformers and stylometric features) was first applied separately, and then we explored various ways to combine them. The most efficient method was based on ensemble learning utilizing majority voting employing the two most accurate transformer models in our training data and a comprehensive combined concatenation of many different stylometric feature groups. The macro-F1 scores on the test sets on subtasks 1 and 2 were 60.78 and 55.87, respectively, positioning our group above the median of the competing teams. This study underscores the potential of combining transformer learning and stylometric features to enhance the accuracy of AI-generated text detection.
AB - This study aims to develop an effective and precise methodology for detecting AI-generated text, leveraging the synergistic combination of transformer learning and stylometric features. The research utilized two datasets provided by the AuTexTification: Automated Text Identification shared task, a component of IberLEF 2023, the 5th Workshop on Iberian Languages Evaluation Forum held at the SEPLN 2023 Conference. Our team engaged in both English language subtasks, which included binary classification of texts as either human or AI-generated and multiclass classification to predict the specific AI writing model employed from a selection of six. Our main approach was to experiment with multiple Transformer models and, at the same time, to use an extensive stylometric feature engineering workflow. Each method (transformers and stylometric features) was first applied separately, and then we explored various ways to combine them. The most efficient method was based on ensemble learning utilizing majority voting employing the two most accurate transformer models in our training data and a comprehensive combined concatenation of many different stylometric feature groups. The macro-F1 scores on the test sets on subtasks 1 and 2 were 60.78 and 55.87, respectively, positioning our group above the median of the competing teams. This study underscores the potential of combining transformer learning and stylometric features to enhance the accuracy of AI-generated text detection.
KW - AI-writing detection
KW - ensemble learning
KW - stylometry
KW - transformers
UR - http://www.scopus.com/inward/record.url?scp=85175313512&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85175313512
SN - 1613-0073
VL - 3496
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
T2 - 2023 Iberian Languages Evaluation Forum, IberLEF 2023
Y2 - 26 September 2023
ER -