Autor: Philomin Juliana

Deep kernel and deep learning for genomic-based prediction

Jose Crossa Paulino Pérez-Rodríguez Juan Burgueño Ravi Singh Philomin Juliana Osval Antonio Montesinos-Lopez Jaime Cuevas (2019)

Deep learning (DL) is a promising method in the context of genomic prediction for selecting individuals early in time without measuring their phenotypes. iI this paper we compare the performance in terms of genome-based prediction of the DL method, deep kernel (arc-cosine kernel, AK) method, Gaussian kernel (GK) method and the conventional kernel method (Genomic Best Linear Unbiased Predictor, GBLUP, GB). We used two real wheat data sets for the benchmarking of these methods. We found that the GK and deep kernel AK methods outperformed the DL and the conventional GB methods, although the gain in terms of prediction performance of AK and GK was not very large but they have the advantage that no tuning parameters are required. Furthermore, although AK and GK had similar genomic-based performance, deep kernel AK is easier to implement than the GK. For this reason, our results suggest that AK is an alternative to DL models with the advantage that no tuning process is required.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Genome-based prediction of multiple wheat quality traits in multiple years

Maria Itria Ibba Jose Crossa Osval Antonio Montesinos-Lopez Philomin Juliana Carlos Guzman Susanne Dreisigacker Jesse Poland (2020)

The use of genomic prediction could greatly help to increase the efficiency of selecting for wheat quality traits by reducing the cost and time required for this analysis. This study contains data used to evaluate the prediction performances of 13 wheat quality traits under two multi-trait models [Bayesian multi-trait multi-environment (BMTME) and multi-trait ridge regression (MTR)]. Separate files are provided for each year of data. An additional supplemental data file provides R code for running the analyses as well as a table describing the Average Pearson´s correlation (APC) and mean arctangent absolute percentage error (MAAPE) for the testing sets for each dataset and trait.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Joint use of genome, pedigree and their interaction with environment for predicting the performance of wheat lines in new environments

Osval Antonio Montesinos-Lopez Philomin Juliana Ravi Singh Jesse Poland Paulino Pérez-Rodríguez Jose Crossa DIEGO JARQUIN (2019)

In this study, we evaluated genome-based prediction using 35,403 wheat lines from the Global Wheat Breeding Program of the International Maize and Wheat Improvement Center (CIMMYT). We implemented eight statistical models that included genome-wide molecular marker and pedigree information in two different validation schemes. All models included main effects, and others also considered interactions between the different types of covariates via Hadamard products of similarity structures. The pedigree models always gave better results predicting new lines in observed environments than the genome-based models when only main effects were fitted. However, for all traits, the highest predictive abilities were obtained when interactions between pedigree, markers and environments were included. When new lines were predicted in unobserved environments in almost all trait/year combinations, the marker main-effects model was the best. These results provide strong evidence that the different sources of genetic information (molecular markers and pedigree) are not equally useful at different stages of the breeding pipelines, and can be employed differentially to improve the design of future breeding programs.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Haplotype-based genome-wide association study unveils stable genomic regions for grain yield in CIMMYT spring bread wheat

deepmala sehgal Suchismita Mondal Leonardo Abdiel Crespo Herrera Govindan Velu Philomin Juliana JULIO HUERTA_ESPINO Sandesh Kumar Shrestha Jesse Poland Ravi Singh Susanne Dreisigacker (2020)

Genetic architecture of grain yield (GY) has been extensively investigated in wheat using genome wide association study (GWAS) approach. However, most studies have used small panel sizes in combination with large genotypic data, typical examples of the so-called ‘large p small n’ or ‘short-fat data’ problem. Further, use of bi-allelic SNPs accentuated ‘missing heritability’ issues and therefore reported markers had limited impact in wheat breeding. We performed haplotype-based GWAS using 519 haplotype blocks on seven large cohorts of advanced CIMMYT spring bread wheat lines comprising overall 6,333 genotypes. In addition, epistatic interactions among the genome-wide haplotypes were investigated, an important aspect which has not yet been fully explored in wheat GWAS in order to address the missing heritability. Our results unveiled the intricate genetic architecture of GY controlled by both main and epistatic effects. The importance of these results from practical applications in the CIMMYT breeding program is discussed.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA

Replication Data for: Elucidating the genetics of grain yield and stress-resilience in bread wheat using a large-scale genome-wide association mapping study with 55,568 lines

Philomin Juliana Ravi Singh Jesse Poland Sandesh Kumar Shrestha JULIO HUERTA_ESPINO Govindan Velu Suchismita Mondal Leonardo Abdiel Crespo Herrera UTTAM KUMAR arun joshi Thomas Payne Pradeep Kumar Bhati Vipin Tomar (2021)

A large-scale genome-wide association study was carried out to dissect the genetic architecture of wheat grain yield potential and stress-resilience. Based on the findings, grain yield-associated marker profiles were generated for a large panel of 73,142 wheat lines and the grain-yield favorable allele frequencies were also determined. The marker profile data are presented in this dataset.

Dataset

CIENCIAS AGROPECUARIAS Y BIOTECNOLOGÍA