Summary.In cross-breeding experiments it can be of interest to see whether there are any synergistic effects of certain genes. This could be by being particularly useful or detrimental to the individual. This type of effect involving multiple genes is called epistasis. Epistatic interactions can affect growth, fertility traits or even cause complete lethality. However, detecting epistasis in genomewide studies is challenging as multiple-testing approaches are underpowered. We develop a method for reconstructing an underlying network of genomic signatures of high dimensional epistatic selection from multilocus genotype data. The network captures the conditionally dependent short-and long-range linkage disequilibrium structure and thus reveals 'aberrant' marker-marker associations that are due to epistatic selection rather than gametic linkage. The network estimation relies on penalized Gaussian copula graphical models, which can account for a large number of markers p and a small number of individuals n. We demonstrate the efficiency of the proposed method on simulated data sets as well as on genotyping data in Arabidopsis thaliana and maize.
Background In nutritional epidemiology, dealing with confounding and complex internutrient relations are major challenges. An often-used approach is dietary pattern analyses, such as principal component analysis, to deal with internutrient correlations, and to more closely resemble the true way nutrients are consumed. However, despite these improvements, these approaches still require subjective decisions in the preselection of food groups. Moreover, they do not make efficient use of multivariate dietary data, because they detect only marginal associations. We propose the use of copula graphical models (CGMs) to model and make statistical inferences regarding complex associations among variables in multivariate data, where associations between all variables can be learned simultaneously. Objective We aimed to reconstruct nutritional intake and physical functioning networks in Dutch older adults by applying a CGM. Methods We addressed this issue by uncovering the pairwise associations between variables while correcting for the effect of remaining variables. More specifically, we used a CGM to infer the precision matrix, which contains all the conditional independence relations between nodes in the graph. The nonzero elements of the precision matrix indicate the presence of a direct association. We applied this method to reconstruct nutrient–physical functioning networks from the combined data of 4 studies (Nu-Age, ProMuscle, ProMO, and V-Fit, total n = 662, mean ± SD age = 75 ± 7 y). The method was implemented in the R package nutriNetwork which is freely available at https://cran.r-project.org/web/packages/nutriNetwork. Results Greater intakes of vegetable protein and vitamin B-6 were partially correlated with higher scores on the total Short Physical Performance Battery (SPPB) and the chair rise test. Greater intakes of vitamin B-12 and folate were partially correlated with higher scores on the chair rise test and the total SPPB, respectively. Conclusions We determined that vegetable protein, vitamin B-6, folate, and vitamin B-12 intakes are partially correlated with improved functional outcome measurements in Dutch older adults.
Supplementary data are available at Bioinformatics online.
Graphical models are powerful tools for modeling and making statistical inferences regarding complex associations among variables in multivariate data. In this paper we introduce the R package netgwas, which is designed based on undirected graphical models to accomplish three important and interrelated goals in genetics: constructing linkage map, reconstructing linkage disequilibrium (LD) networks from multi-loci genotype data, and detecting highdimensional genotype-phenotype networks.The netgwas package deals with species with any chromosome copy number in an unified way, unlike other software. It implements recent improvements in both linkage map construction (Behrouzi and Wit, 2018), and reconstructing conditional independence network for non-Gaussian continuous data, discrete data, and mixed discrete-and-continuous data (Behrouzi and Wit, 2017). Such datasets routinely occur in genetics and genomics such as genotype data, and genotype-phenotype data.We demonstrate the value of our package functionality by applying it to various multivariate example datasets taken from the literature. We show, in particular, that our package allows a more realistic analysis of data, as it adjusts for the effect of all other variables while performing pairwise associations. This feature controls for spurious associations between variables that can arise from classical multiple testing approach. This paper includes a brief overview of the statistical methods which have been implemented in the package. The main body of the paper explains how to use the package. The package uses a parallelization strategy on multi-core processors to speed-up computations for large datasets. In addition, it contains several functions for simulation
Genetic variance of a phenotypic trait can originate from direct genetic effects, or from indirect effects, i.e., through genetic effects on other traits, affecting the trait of interest. This distinction is often of great importance, for example, when trying to improve crop yield and simultaneously control plant height. As suggested by Sewall Wright, assessing contributions of direct and indirect effects requires knowledge of (1) the presence or absence of direct genetic effects on each trait, and (2) the functional relationships between the traits. Because experimental validation of such relationships is often unfeasible, it is increasingly common to reconstruct them using causal inference methods. However, most current methods require all genetic variance to be explained by a small number of quantitative trait loci (QTL) with fixed effects. Only a few authors have considered the “missing heritability” case, where contributions of many undetectable QTL are modeled with random effects. Usually, these are treated as nuisance terms that need to be eliminated by taking residuals from a multi-trait mixed model (MTM). But fitting such an MTM is challenging, and it is impossible to infer the presence of direct genetic effects. Here, we propose an alternative strategy, where genetic effects are formally included in the graph. This has important advantages: (1) genetic effects can be directly incorporated in causal inference, implemented via our PCgen algorithm, which can analyze many more traits; and (2) we can test the existence of direct genetic effects, and improve the orientation of edges between traits. Finally, we show that reconstruction is much more accurate if individual plant or plot data are used, instead of genotypic means. We have implemented the PCgen-algorithm in the R-package pcgen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.