CropML (2021-2024)
funded by the Federal Ministry of Education and Research (BMBF)
We are conducting research in several funded projects. You can find more information about CropML below. If you are interested in more details and discussions about our projects, do not hesitate to contact us.
For more information please contact Dominik Grimm.
Project Description
Currently, the agricultural industry is under great pressure to deliver new crop varieties quickly for a changing climate and to use fewer resources. The goal is to increase yield and become more sustainable. To accelerate breeding programs, plant breeders are using genomic selection methods to predict the expected value of a trait, such as yield from the genetic profiles of plants before the plants have been tested in the field.
The trait expression of plants is influenced by two main factors: their genetic, i.e. inherited, traits and the environment in which they grow. The aim of the joint project “CropML” is to develop machine learning (ML) models that take both into account, i.e. environmental influences in addition to genetics. To this end, data describing the environment will be integrated, e.g. measured values of weather, soil conditions or agronomic factors such as fertilizer use.
During the project, suitable data sources for environmental descriptions will be identified and pre-processed to be compatible with genetic data for ML models. New ML methods will be developed that can integrate the very heterogeneous data from genetic profiles and environmental factors and model the influence of both sources on the trait to be predicted, especially their interaction. The methods developed will be largely automated to provide breeders with rapid information for time-critical decisions.
This will allow more precise selection of promising varieties. It will also help identify suitable varieties for new regions and changing climates. By using the developed methods, breeders will gain an economic and ecological advantage by breeding better and more robust varieties with fewer resources.
Project Information:
Project title: New machine learning techniques for more accurate plant breeding by integrating heterogeneous external factors (CropML)
Project Partners
- Computomics GmbH (https://computomics.com)
Project Coordinator: Dr. Sebastian J. Schultheiss, Managing Director - Weihenstephan-Triesdorf University of Applied Sciences & TUM Campus Straubing for Biotechnology and Sustainability
Project Coordinator: Prof. Dr. Dominik Grimm
Project Advisor: Maura John
Funding: The project is supported by funds of the Federal Ministry of Education and Research (BMBF) (01IS21038A).
Publications
easyPheno: An easy-to-use and easy-to-extend Python framework for phenotype prediction using Bayesian optimization.
F Haselbeck*, M John*, DG Grimm (* equal contribution)
Bioinformatics Advances, Vol. 3, 2023
(https://doi.org/10.1093/bioadv/vbad035) [Code] [Documentation]
A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species.
M John*, F Haselbeck*, R Dass, C Malisi, P Ricca, C Dreischer, SJ Schultheiss, DG Grimm (* equal contribution)
Frontiers in Plant Science, Vol. 13, 2022
(https://doi.org/10.3389/fpls.2022.932512) [Code]
Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions.
M John, M Ankenbrand, C Artmann, J Freudenthal, A Korte*, DG Grimm* (* equal contribution)
Bioinformatics (European Conference on Computational Biology (ECCB) 2022), 2022
(https://doi.org/10.1093/bioinformatics/btac690) [Code]
Software
easyPheno: state-of-the-art and easy-to-use phenotype prediction
easyPheno is a Python framework that enables the rigorous training, comparison and analysis of phenotype predictions for a variety of different models. easyPheno includes multiple state-of-the-art prediction models. Besides common genomic selection approaches, such as best linear unbiased prediction (BLUP) and models from the Bayesian alphabet, our framework includes several machine learning methods. These range from classical models, such as regularized linear regression over ensemble learners, e.g. XGBoost, to deep learning-based architectures, such as Convolutional Neural Networks (CNN). To enable automatic hyperparameter optimization, we leverage state-of-the-art and efficient Bayesian optimization techniques. In addition, our framework is designed to allow an easy and straightforward integration of further prediction models.
easyPheno is publicly available at https://github.com/grimmlab/easyPheno and can be easily installed as Python package via https://pypi.org/project/easypheno/ or using Docker.
More information can also be found in the following publication. Please cite our publication when using easyPheno.
easyPheno: An easy-to-use and easy-to-extend Python framework for phenotype prediction using Bayesian optimization.
F Haselbeck*, M John*, DG Grimm (* equal contribution)
Bioinformatics Advances, Vol. 3, 2023
(https://doi.org/10.1093/bioadv/vbad035)
permGWAS: efficient permutation-based GWAS
permGWAS is an open source software tool written in python to efficiently perform genome-wide association studies (GWAS) with permutation-based thresholds. permGWAS supports using multiple CPUs and GPUs.
permGWAS is publicly available at https://github.com/grimmlab/permGWAS
More information can also be found in the following publication. Please cite our publication when using permGWAS.
Efficient Permutation-based Genome-wide Association Studies for Normal and Skewed Phenotypic Distributions.
M John, M Ankenbrand, C Artmann, J Freudenthal, A Korte*, DG Grimm* (* equal contribution)
Bioinformatics (European Conference on Computational Biology (ECCB) 2022), 2022
(https://doi.org/10.1093/bioinformatics/btac455)