LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

Fahim Dalvi, Maram Hasanain, Sabri Boughorbel, Basel Mousi, Samir Abdaljalil, Nizi Nazar, Ahmed Abdelali*, Shammur Absar Chowdhury, Hamdy Mubarak, Ahmed Ali, Majd Hawasly, Nadir Durrani, Firoj Alam

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Citations (Scopus)

Abstract

The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available, their customization capabilities for specific tasks and datasets are often complex for different users. In this study, we introduce the LLMeBench1 framework, which can be seamlessly customized to evaluate LLMs for any NLP task, regardless of language. The framework features generic dataset loaders, several model providers, and pre-implements most standard evaluation metrics. It supports in-context learning with zero- and few-shot settings. A specific dataset and task can be evaluated for a given LLM in less than 20 lines of code while allowing full flexibility to extend the framework for custom datasets, models, or tasks. The framework has been tested on 31 unique NLP tasks using 53 publicly available datasets within 90 experimental setups, involving approximately 296K data points. We open-sourced LLMeBench for the community2 and a video demonstrating the framework is available online.

Original languageEnglish
Title of host publicationEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations
EditorsNikolaos Aletras, Orphee De Clercq
PublisherAssociation for Computational Linguistics (ACL)
Pages214-222
Number of pages9
ISBN (Electronic)9798891760912
Publication statusPublished - 2024
Event18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - St. Julian's, Malta
Duration: 17 Mar 202422 Mar 2024

Publication series

NameEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations

Conference

Conference18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024
Country/TerritoryMalta
CitySt. Julian's
Period17/03/2422/03/24

Fingerprint

Dive into the research topics of 'LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking'. Together they form a unique fingerprint.

Cite this