Guilherme J. M. Rosa scite author profile

BackgroundIn the study of associations between genomic data and complex phenotypes there may be relationships that are not amenable to parametric statistical modeling. Such associations have been investigated mainly using single-marker and Bayesian linear regression models that differ in their distributions, but that assume additive inheritance while ignoring interactions and non-linearity. When interactions have been included in the model, their effects have entered linearly. There is a growing interest in non-parametric methods for predicting quantitative traits based on reproducing kernel Hilbert spaces regressions on markers and radial basis functions. Artificial neural networks (ANN) provide an alternative, because these act as universal approximators of complex functions and can capture non-linear relationships between predictors and responses, with the interplay among variables learned adaptively. ANNs are interesting candidates for analysis of traits affected by cryptic forms of gene action.ResultsWe investigated various Bayesian ANN architectures using for predicting phenotypes in two data sets consisting of milk production in Jersey cows and yield of inbred lines of wheat. For the Jerseys, predictor variables were derived from pedigree and molecular marker (35,798 single nucleotide polymorphisms, SNPS) information on 297 individually cows. The wheat data represented 599 lines, each genotyped with 1,279 markers. The ability of predicting fat, milk and protein yield was low when using pedigrees, but it was better when SNPs were employed, irrespective of the ANN trained. Predictive ability was even better in wheat because the trait was a mean, as opposed to an individual phenotype in cows. Non-linear neural networks outperformed a linear model in predictive ability in both data sets, but more clearly in wheat.ConclusionResults suggest that neural networks may be useful for predicting complex traits using high-dimensional genomic information, a situation where the number of unknowns exceeds sample size. ANNs can capture nonlinearities, adaptively. This may be useful when prediction of phenotypes is crucial.

show abstract

Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits

González-Recio

2014

View full text Add to dashboard Cite

Genome-wide association analysis in dogs implicates 99 loci as risk variants for anterior cruciate ligament rupture

et al. 2017

View full text Add to dashboard Cite

Anterior cruciate ligament (ACL) rupture is a common condition that can be devastating and life changing, particularly in young adults. A non-contact mechanism is typical. Second ACL ruptures through rupture of the contralateral ACL or rupture of a graft repair is also common. Risk of rupture is increased in females. ACL rupture is also common in dogs. Disease prevalence exceeds 5% in several dog breeds, ~100 fold higher than human beings. We provide insight into the genetic etiology of ACL rupture by genome-wide association study (GWAS) in a high-risk breed using 98 case and 139 control Labrador Retrievers. We identified 129 single nucleotide polymorphisms (SNPs) within 99 risk loci. Associated loci (P<5E-04) explained approximately half of phenotypic variance in the ACL rupture trait. Two of these loci were located in uncharacterized or non-coding regions of the genome. A chromosome 24 locus containing nine genes with diverse functions met genome-wide significance (P = 3.63E-0.6). GWAS pathways were enriched for c-type lectins, a gene set that includes aggrecan, a gene set encoding antimicrobial proteins, and a gene set encoding membrane transport proteins with a variety of physiological functions. Genotypic risk estimated for each dog based on the risk contributed by each GWAS locus showed clear separation of ACL rupture cases and controls. Power analysis of the GWAS data set estimated that ~172 loci explain the genetic contribution to ACL rupture in the Labrador Retriever. Heritability was estimated at 0.48. We conclude ACL rupture is a moderately heritable highly polygenic complex trait. Our results implicate c-type lectin pathways in ACL homeostasis.

show abstract

Mining data from milk infrared spectroscopy to improve feed intake predictions in lactating dairy cows

Dórea

Rosa

Weld

et al. 2018

Journal of Dairy Science

View full text Add to dashboard Cite

Feed intake is one of the most important components of feed efficiency in dairy systems. However, it is a difficult trait to measure in commercial operations for individual cows. Milk spectrum from mid-infrared spectroscopy has been previously used to predict milk traits, and could be an alternative to predict dry matter intake (DMI). The objectives of this study were (1) to evaluate if milk spectra can improve DMI predictions based only on cow variables; (2) to compare artificial neural network (ANN) and partial least squares (PLS) predictions; and (3) to evaluate if wavelength (WL) selection through Bayesian network (BN) improves prediction quality. Milk samples (n = 1,279) from 308 mid-lactation dairy cows [127 ± 27 d in milk (DIM)] were collected between 2014 and 2016. For each milk spectra time point, DMI (kg/d), body weight (BW, kg), milk yield (MY, kg/d), fat (%), protein (%), lactose (%), and actual DIM were recorded. The DMI was predicted with ANN and PLS using different combinations of explanatory variables. Such combinations, called covariate sets, were as follows: set 1 (MY, BW, DIM, and 361 WL); set 2 [MY, BW, DIM, and 33 WL (WL selected by BN)]; set 3 (MY, BW, DIM, and fat, protein, and lactose concentrations); set 4 (MY, BW, DIM, 33 WL, fat, protein, and lactose); set 5 (MY, BW, DIM, 33 WL, and visit duration in the feed bunk); set 6 (MY, DIM, and 33 WL); set 7 (MY, BW, and DIM); set-WL (included 361 WL); and set-BN (included just 33 selected WL). All models (i.e., each combination of covariate set and fitting approach, ANN or PLS) were validated with an external data set. The use of ANN improved the performance of models 2, 5, 6, and BN. The use of BN combined with ANN yielded the highest accuracy and precision. The addition of individual WL compared with milk components (set 2 vs. set 3) did not improve prediction quality when using PLS. However, when ANN was employed, the model prediction with the inclusion of 33 WL was improved over the model containing only milk components (set 2 vs. set 3; concordance correlation coefficient = 0.80 vs. 0.72; coefficient of determination = 0.67 vs. 0.53; root mean square error of prediction 2.36 vs. 2.81 kg/d). The use of ANN and the inclusion of a behavior parameter, set 5, resulted in the best predictions compared with all other models (coefficient of determination = 0.70, concordance correlation coefficient = 0.83, root mean square error of prediction = 2.15 kg/d). The addition of milk spectra information to models containing cow variables improved the accuracy and precision of DMI predictions in lactating dairy cows when ANN was used. The use of BN to select more informative WL improved the model prediction when combined with cow variables, with further improvement when combined with ANN.

show abstract

Modeling relationships between calving traits: a comparison between standard and recursive mixed models

Maturana

Campos

et al. 2010

Genet Sel Evol

View full text Add to dashboard Cite

BackgroundThe use of structural equation models for the analysis of recursive and simultaneous relationships between phenotypes has become more popular recently. The aim of this paper is to illustrate how these models can be applied in animal breeding to achieve parameterizations of different levels of complexity and, more specifically, to model phenotypic recursion between three calving traits: gestation length (GL), calving difficulty (CD) and stillbirth (SB). All recursive models considered here postulate heterogeneous recursive relationships between GL and liabilities to CD and SB, and between liability to CD and liability to SB, depending on categories of GL phenotype.MethodsFour models were compared in terms of goodness of fit and predictive ability: 1) standard mixed model (SMM), a model with unstructured (co)variance matrices; 2) recursive mixed model 1 (RMM1), assuming that residual correlations are due to the recursive relationships between phenotypes; 3) RMM2, assuming that correlations between residuals and contemporary groups are due to recursive relationships between phenotypes; and 4) RMM3, postulating that the correlations between genetic effects, contemporary groups and residuals are due to recursive relationships between phenotypes.ResultsFor all the RMM considered, the estimates of the structural coefficients were similar. Results revealed a nonlinear relationship between GL and the liabilities both to CD and to SB, and a linear relationship between the liabilities to CD and SB.Differences in terms of goodness of fit and predictive ability of the models considered were negligible, suggesting that RMM3 is plausible.ConclusionsThe applications examined in this study suggest the plausibility of a nonlinear recursive effect from GL onto CD and SB. Also, the fact that the most restrictive model RMM3, which assumes that the only cause of correlation is phenotypic recursion, performs as well as the others indicates that the phenotypic recursion may be an important cause of the observed patterns of genetic and environmental correlations.

show abstract

Transcriptome of Local Innate and Adaptive Immunity during Early Phase of Infectious Bronchitis Viral Infection

et al. 2006

View full text Add to dashboard Cite

To understand the mechanistic basis of local innate and adaptive immunity against infectious bronchitis virus (IBV) at the molecular level, we examined the gene transcription profile of tracheal epithelial layers 3 d after infection of chickens with an attenuated IBV-Massachusetts strain. Results suggested that the transcription levels of 365 genes were either upregulated or downregulated (2-fold and higher) after IBV infection. Among the upregulated 250 genes, 25 were directly immune-related genes. These upregulated immune response genes included TLR2, TLR3, interferon-induced antiviral genes (Mx), and genes responsible for cytotoxic T cell killing such as Fas antigen and granzyme-A. Overall, a diversity of innate immunity and helper T cell type 1 (Th1)-biased adaptive immunity are activated in the host's early defense against IBV invasion, and they are responsible for the rapid clearance of virus from the local infection.

show abstract

A Vision for Development and Utilization of High-Throughput Phenotyping and Big Data Analytics in Livestock

et al. 2019

View full text Add to dashboard Cite

show abstract

An ensemble-based approach to imputation of moderate-density genotypes for genomic selection with application to Angus cattle

Sun

Weigel

et al. 2012

Genet. Res.

View full text Add to dashboard Cite

Summary Imputation of moderate-density genotypes from low-density panels is of increasing interest in genomic selection, because it can dramatically reduce genotyping costs. Several imputation software packages have been developed, but they vary in imputation accuracy, and imputed genotypes may be inconsistent among methods. An AdaBoost-like approach is proposed to combine imputation results from several independent software packages, i.e. Beagle(v3.3), IMPUTE(v2.0), fastPHASE(v1.4), AlphaImpute, findhap(v2) and Fimpute(v2), with each package serving as a basic classifier in an ensemble-based system. The ensemble-based method computes weights sequentially for all classifiers, and combines results from component methods via weighted majority 'voting' to determine unknown genotypes. The data included 3078 registered Angus cattle, each genotyped with the Illumina BovineSNP50 BeadChip. SNP genotypes on three chromosomes (BTA1, BTA16 and BTA28) were used to compare imputation accuracy among methods, and the application involved the imputation of 50K genotypes covering 29 chromosomes based on a set of 5K genotypes. Beagle and Fimpute had the greatest accuracy among the six imputation packages, which ranged from 0·8677 to 0·9858. The proposed ensemble method was better than any of these packages, but the sequence of independent classifiers in the voting scheme affected imputation accuracy. The ensemble systems yielding the best imputation accuracies were those that had Beagle as first classifier, followed by one or two methods that utilized pedigree information. A salient feature of the proposed ensemble method is that it can solve imputation inconsistencies among different imputation methods, hence leading to a more reliable system for imputing genotypes relative to independent methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.