Abstract
Motivation: Gene annotation is the problem of mapping proteins to their functions represented as Gene Ontology (GO) terms, typically inferred based on the primary sequences. Gene annotation is a multi-label multi-class classification problem, which has generated growing interest for its uses in the characterization of millions of proteins with unknown functions. However, there is no standard GO dataset used for benchmarking the newly developed new machine learning models within the bioinformatics community. Thus, the significance of improvements for these models remains unclear. Results: The Gene Benchmarking database is the first effort to provide an easy-to-use and configurable hub for the learning and evaluation of gene annotation models. It provides easy access to pre-specified datasets and takes the non-trivial steps of preprocessing and filtering all data according to custom presets using a web interface. The GO bench web application can also be used to evaluate and display any trained model on leaderboards for annotation tasks.
Original language | English |
---|---|
Article number | btad081 |
Journal | Bioinformatics |
Volume | 39 |
Issue number | 2 |
Early online date | Feb 2023 |
DOIs | |
Publication status | Published - 1 Feb 2023 |
Externally published | Yes |