With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the -MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675–685) and propose an approach (named ‘adaptive combination of P-values for rare variant association testing’, abbreviated as ‘ADA’) that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.
Identification of multifactor gene-gene (G×G) and gene-environment (G×E) interactions underlying complex traits poses one of the great challenges to today’s genetic study. Development of the generalized multifactor dimensionality reduction (GMDR) method provides a practicable solution to problems in detection of interactions. To exploit the opportunities brought by the availability of diverse data, it is in high demand to develop the corresponding GMDR software that can handle a breadth of phenotypes, such as continuous, count, dichotomous, polytomous nominal, ordinal, survival and multivariate, and various kinds of study designs, such as unrelated case-control, family-based and pooled unrelated and family samples, and also allows adjustment for covariates. We developed a versatile GMDR package to implement this serial of GMDR analyses for various scenarios (e.g., unified analysis of unrelated and family samples) and large-scale (e.g., genome-wide) data. This package includes other desirable features such as data management and preprocessing. Permutation testing strategies are also built in to evaluate the threshold or empirical p values. In addition, its performance is scalable to the computational resources. The software is available at http://www.soph.uab.edu/ssg/software or http://ibi.zju.edu.cn/software.
Investigations to identify quantitative trait loci (QTLs) governing cooking quality traits including amylose content, gel consistency and gelatinization temperature (expressed by the alkali spread value) were conducted using a set of 241 RIL populations derived from an elite hybrid cross of "Zhenshan 97"x"Minghui 63" and their reciprocal backcrosses BC1F1 and BC2F1 populations in two environments. QTLs and QTLxenvironment interactions were analyzed by using the genetic model with endosperm and maternal effects and environmental interaction effects on quantitative traits of seed in cereal crops. The results suggested that a total of seven QTLs were associated with cooking quality of rice, which were subsequently mapped to chromosomes 1, 4 and 6. Six of these QTLs were also found to have environmental interaction effects.
Detection of interacting risk factors for complex traits is challenging. The choice of an appropriate method, sample size, and allocation of cases and controls are serious concerns. To provide empirical guidelines for planning such studies and data analyses, we investigated the performance of the multifactor dimensionality reduction (MDR) and generalized MDR (GMDR) methods under various experimental scenarios. We developed the mathematical expectation of accuracy and used it as an indicator parameter to perform a gene-gene interaction study. We then examined the statistical power of GMDR and MDR within the plausible range of accuracy (0.50∼0.65) reported in the literature. The GMDR with covariate adjustment had a power of>80% in a case-control design with a sample size of≥2000, with theoretical accuracy ranging from 0.56 to 0.62. However, when the accuracy was<0.56, a sample size of≥4000 was required to have sufficient power. In our simulations, the GMDR outperformed the MDR under all models with accuracy ranging from 0.56∼0.62 for a sample size of 1000–2000. However, the two methods performed similarly when the accuracy was outside this range or the sample was significantly larger. We conclude that with adjustment of a covariate, GMDR performs better than MDR and a sample size of 1000∼2000 is reasonably large for detecting gene-gene interactions in the range of effect size reported by the current literature; whereas larger sample size is required for more subtle interactions with accuracy<0.56.
BackgroundBrassica napus is an important oilseed crop. Dissection of the genetic architecture underlying oil-related biological processes will greatly facilitates the genetic improvement of rapeseed. The differential gene expression during pod development offers a snapshot on the genes responsible for oil accumulation in. To identify candidate genes in the linkage peaks reported previously, we used RNA sequencing (RNA-Seq) technology to analyze the pod transcriptomes of German cultivar Sollux and Chinese inbred line Gaoyou.MethodsThe RNA samples were collected for RNA-Seq at 5-7, 15-17 and 25-27 days after flowering (DAF). Bioinformatics analysis was performed to investigate differentially expressed genes (DEGs). Gene annotation analysis was integrated with QTL mapping and Brassica napus pod transcriptome profiling to detect potential candidate genes in oilseed.ResultsFour hundred sixty five and two thousand, one hundred fourteen candidate DEGs were identified, respectively, between two varieties at the same stages and across different periods of each variety. Then, 33 DEGs between Sollux and Gaoyou were identified as the candidate genes affecting seed oil content by combining those DEGs with the quantitative trait locus (QTL) mapping results, of which, one was found to be homologous to Arabidopsis thaliana lipid-related genes.DiscussionIntervarietal DEGs of lipid pathways in QTL regions represent important candidate genes for oil-related traits. Integrated analysis of transcriptome profiling, QTL mapping and comparative genomics with other relative species leads to efficient identification of most plausible functional genes underlying oil-content related characters, offering valuable resources for bettering breeding program of Brassica napus.ConclusionsThis study provided a comprehensive overview on the pod transcriptomes of two varieties with different oil-contents at the three developmental stages.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2062-7) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.