Simulation tools that simulate sequence data in unrelated cases and controls or in families with quantitative traits or disease status are important for genetic studies. The simulation tools can be used to evaluate the statistical power for detecting the causal variants when planning a genetic epidemiology study, or to evaluate the statistical properties for new methods. We previously developed SeqSIMLA version 1 (SeqSIMLA1), which simulates family or case-control data with a disease or quantitative trait model. SeqSIMLA1, and several other tools that simulate quantitative traits, do not specifically model the shared environmental effects among relatives on a trait. However, shared environmental effects are commonly observed for some traits in families, such as body mass index. SeqSIMLA1 simulates a fixed three-generation family structure. However, it would be ideal to simulate prespecified pedigree structures for studies involving large pedigrees. Thus, we extended SeqSIMLA1 to create SeqSIMLA2, which can simulate correlated traits and considers the shared environmental effects. SeqSIMLA2 can also simulate prespecified large pedigree structures. There are no restrictions on the number of individuals that can be simulated in a pedigree. We used a blood pressure example to demonstrate that SeqSIMLA2 can simulate realistic correlation structures between the systolic and diastolic blood pressure among relatives. We also showed that SeqSIMLA2 can simulate large pedigrees with large chromosomal regions in a reasonable time frame.Genet Epidemiol 39:20-24, 2015.
Statistical association tests for rare variants can be classified as the burden approach and the sequence kernel association test (SKAT) approach. The burden and SKAT approaches, originally developed for case–control analysis, have also been extended to family-based tests. In the presence of both case–control and family data for a study, joint analysis for the combined data set can increase the statistical power. We extended the Combined Association in the Presence of Linkage (CAPL) test, using both case–control and family data for testing common variants, to rare variant association analysis. The burden and SKAT algorithms were applied to the CAPL test. We used simulations to verify that the CAPL tests incorporating the burden and SKAT algorithms have correct type I error rates. Power studies suggested that both tests have adequate power to identify rare variants associated with the disease. We applied the tests to the Genetic Analysis Workshop 19 data set using the combined family and case–control data for hypertension. The analysis identified several candidate genes for hypertension.
In disease studies, family-based designs have become an attractive approach to analyzing next-generation sequencing (NGS) data for the identification of rare mutations enriched in families. Substantial research effort has been devoted to developing pipelines for automating sequence alignment, variant calling, and annotation. However, fewer pipelines have been designed specifically for disease studies. Most of the current analysis pipelines for family-based disease studies using NGS data focus on a specific function, such as identifying variants with Mendelian inheritance or identifying shared chromosomal regions among affected family members. Consequently, some other useful family-based analysis tools, such as imputation, linkage, and association tools, have yet to be integrated and automated. We developed FamPipe, a comprehensive analysis pipeline, which includes several family-specific analysis modules, including the identification of shared chromosomal regions among affected family members, prioritizing variants assuming a disease model, imputation of untyped variants, and linkage and association tests. We used simulation studies to compare properties of some modules implemented in FamPipe, and based on the results, we provided suggestions for the selection of modules to achieve an optimal analysis strategy. The pipeline is under the GNU GPL License and can be downloaded for free at http://fampipe.sourceforge.net.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.