Select Page
New Article in Nature Methods: Guiding questions to avoid data leakage in biological machine learning applications

New Article in Nature Methods: Guiding questions to avoid data leakage in biological machine learning applications

Artificial intelligence (AI) has become indispensable in biological research and is driving major advances. However, in certain cases, real-world applications fail to confirm reported predictive performance. One of the main reasons for this is data leakage, i.e. the unauthorized transfer of information between training and test data.

In this Nature Methods Perspective, we present seven questions that should be asked to prevent data leakage when constructing machine learning models in biological domains. By applying these questions to real examples in biology, we aim to make researchers aware of the complex latent interdependencies and possibilities of data leakage in biological applications. We strongly encourage researchers to engage in an interdisciplinary dialogue and to consult domain experts from both domains to ensure robust, reliable, and reproducible ML research in biology.

TMLR Paper awarded with “Featured” certification

TMLR Paper awarded with “Featured” certification

Our latest paper “Self-Improvement for Neural Combinatorial Optimization: Sample Without Replacement, but Improvement” has been published in Transactions on Machine Learning Research (TMLR). This is impressive work by Jonathan Pirnay and was awarded with a “Featured” Certification.

Ashima joins the Team as Research Assistant

Ashima joins the Team as Research Assistant

Ashima joins the team as research assistant. She will work on novel machine learning methods for synthetic protein design and the in silico evaluation of generated artificial sequences.