CropML | Bioinformatics | TUM Campus Straubing

Project Description

Currently, the agricultural industry is under great pressure to deliver new crop varieties quickly for a changing climate and to use fewer resources. The goal is to increase yield and become more sustainable. To accelerate breeding programs, plant breeders are using genomic selection methods to predict the expected value of a trait, such as yield from the genetic profiles of plants before the plants have been tested in the field.

The trait expression of plants is influenced by two main factors: their genetic, i.e. inherited, traits and the environment in which they grow. The aim of the joint project “CropML” is to develop machine learning (ML) models that take both into account, i.e. environmental influences in addition to genetics. To this end, data describing the environment will be integrated, e.g. measured values of weather, soil conditions or agronomic factors such as fertilizer use.

During the project, suitable data sources for environmental descriptions will be identified and pre-processed to be compatible with genetic data for ML models. New ML methods will be developed that can integrate the very heterogeneous data from genetic profiles and environmental factors and model the influence of both sources on the trait to be predicted, especially their interaction. The methods developed will be largely automated to provide breeders with rapid information for time-critical decisions.

This will allow more precise selection of promising varieties. It will also help identify suitable varieties for new regions and changing climates. By using the developed methods, breeders will gain an economic and ecological advantage by breeding better and more robust varieties with fewer resources.

Project Information

Project title

New machine learning techniques for more accurate plant breeding by integrating heterogeneous external factors (CropML)

Project Partners

Computomics GmbH (https://computomics.com)
Project Coordinator: Dr. Sebastian J. Schultheiss, Managing Director
Weihenstephan-Triesdorf University of Applied Sciences & TUM Campus Straubing for Biotechnology and Sustainability
Project Coordinator: Prof. Dr. Dominik Grimm
Project Advisor: Maura John

Funding

The project is supported by funds of the Federal Ministry of Education and Research (BMBF) (01IS21038A).

Publications

Population-aware permutation-based significance thresholds for genome-wide association studies
M John, A Korte, M Todesco, DG Grimm
Bioinformatics Advances, 2024
(https://doi.org/10.1093/bioadv/vbae168 ) [Code]

The Benefits of Permutation-Based Genome-Wide Association Studies
M John, A Korte, DG Grimm
Journal of Experimental Botany, 2024
(https://doi.org/10.1093/jxb/erae280)

easyPheno: An easy-to-use and easy-to-extend Python framework for phenotype prediction using Bayesian optimization.
F Haselbeck*, M John*, DG Grimm (* equal contribution)
Bioinformatics Advances, Vol. 3, 2023
(https://doi.org/10.1093/bioadv/vbad035) [Code] [Documentation]

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species.
M John*, F Haselbeck*, R Dass, C Malisi, P Ricca, C Dreischer, SJ Schultheiss, DG Grimm (* equal contribution)
Frontiers in Plant Science, Vol. 13, 2022
(https://doi.org/10.3389/fpls.2022.932512) [Code]

Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions.
M John, M Ankenbrand, C Artmann, J Freudenthal, A Korte*, DG Grimm* (* equal contribution)
Bioinformatics (European Conference on Computational Biology (ECCB) 2022), 2022
(https://doi.org/10.1093/bioinformatics/btac690) [Code]

Software

easyPheno is a Python framework that enables the rigorous training, comparison and analysis of phenotype predictions for a variety of different models. easyPheno includes multiple state-of-the-art prediction models. Besides common genomic selection approaches, such as best linear unbiased prediction (BLUP) and models from the Bayesian alphabet, our framework includes several machine learning methods. These range from classical models, such as regularized linear regression over ensemble learners, e.g. XGBoost, to deep learning-based architectures, such as Convolutional Neural Networks (CNN). To enable automatic hyperparameter optimization, we leverage state-of-the-art and efficient Bayesian optimization techniques. In addition, our framework is designed to allow an easy and straightforward integration of further prediction models.

easyPheno is publicly available at https://github.com/grimmlab/easyPheno and can be easily installed as Python package via https://pypi.org/project/easypheno/ or using Docker.

A comprehensive documentation including various tutorials complemented with videos can be found at https://easypheno.readthedocs.io/.

More information can also be found in the following publication. Please cite our publication when using easyPheno.

permGWAS is an open source software tool written in python to efficiently perform genome-wide association studies (GWAS) with permutation-based thresholds. permGWAS supports using multiple CPUs and GPUs.

permGWAS is publicly available at https://github.com/grimmlab/permGWAS

More information can also be found in the following publication. Please cite our publication when using permGWAS.

Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions.
M John, M Ankenbrand, C Artmann, J Freudenthal, A Korte*, DG Grimm* (* equal contribution)
Bioinformatics (European Conference on Computational Biology (ECCB) 2022), 2022
(https://doi.org/10.1093/bioinformatics/btac455)

CropML Funded by the Federal Ministry of Education and Research (BMBF)

On this page

Project Description

Project Information

Publications

Software

Professorship Bioinformatics

Head

Office

CropML Funded by the Federal Ministry of Education and Research (BMBF)

On this page

Project Description

Project Information

Publications

Software

easyPheno: state-of-the-art and easy-to-use phenotype prediciton

permGWAS: efficient permutation-based GWAS

Contact

Professorship Bioinformatics

Head

Office