TY - GEN
T1 - GenAI Content Detection Task 2
T2 - 1st Workshop on GenAI Content Detection, GenAIDetect 2025
AU - Chowdhury, Shammur Absar
AU - Almerekhi, Hind
AU - Kutlu, Mucahid
AU - Keleş, Kaan Efe
AU - Ahmad, Fatema
AU - Mohiuddin, Tasnim
AU - Mikros, George
AU - Alam, Firoj
N1 - Publisher Copyright:
© 2025 International Conference on Computational Linguistics.
PY - 2025/1/19
Y1 - 2025/1/19
N2 - This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs human-authored essays for academic purposes. The task is defined as follows: “Given an essay, identify whether it is generated by a machine or authored by a human.” The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, five teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.
AB - This paper presents a comprehensive overview of the first edition of the Academic Essay Authenticity Challenge, organized as part of the GenAI Content Detection shared tasks collocated with COLING 2025. This challenge focuses on detecting machine-generated vs human-authored essays for academic purposes. The task is defined as follows: “Given an essay, identify whether it is generated by a machine or authored by a human.” The challenge involves two languages: English and Arabic. During the evaluation phase, 25 teams submitted systems for English and 21 teams for Arabic, reflecting substantial interest in the task. Finally, five teams submitted system description papers. The majority of submissions utilized fine-tuned transformer-based models, with one team employing Large Language Models (LLMs) such as Llama 2 and Llama 3. This paper outlines the task formulation, details the dataset construction process, and explains the evaluation framework. Additionally, we present a summary of the approaches adopted by participating teams. Nearly all submitted systems outperformed the n-gram-based baseline, with the top-performing systems achieving F1 scores exceeding 0.98 for both languages, indicating significant progress in the detection of machine-generated text.
UR - http://www.scopus.com/inward/record.url?scp=105000186337&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:105000186337
T3 - Proceedings - International Conference on Computational Linguistics, COLING
SP - 323
EP - 333
BT - GenAIDetect 2025 - Proceedings of the 1st Workshop on GenAI Content Detection, Proceedings of the Workshop - 31st International Conference on Computational Linguistics, COLING 2025
A2 - Alam, Firoj
A2 - Nakov, Preslav
A2 - Habash, Nizar
A2 - Gurevych, Iryna
A2 - Gurevych, Iryna
A2 - Chowdhury, Shammur
A2 - Shelmanov, Artem
A2 - Wang, Yuxia
A2 - Artemova, Ekaterina
A2 - Kutlu, Mucahid
A2 - Mikros, George
PB - Association for Computational Linguistics (ACL)
Y2 - 19 January 2025
ER -