There have been increasing efforts to relate drug efficacy and disease predisposition with genetic polymorphisms. We present statistical tests for association of haplotype frequencies with discrete and continuous traits in samples of unrelated individuals. Haplotype frequencies are estimated through the expectation-maximization algorithm, and each individual in the sample is expanded into all possible haplotype configurations with corresponding probabilities, conditional on their genotype. A regression-based approach is then used to relate inferred haplotype probabilities to the response. The relationship of this technique to commonly used approaches developed for case-control data is discussed. We confirm the proper size of the test under H₀ and find an increase in power under the alternative by comparing test results using inferred haplotypes with single-marker tests using simulated data. More importantly, analysis of real data comprised of a dense map of single nucleotide polymorphisms spaced along a 12-cM chromosomal region allows us to confirm the utility of the haplotype approach as well as the validity and usefulness of the proposed statistical technique. The method appears to be successful in relating data from multiple, correlated markers to response.
A general question for linkage disequilibrium-based association studies is how power to detect an association is compromised when tag SNPs are chosen from data in one population sample and then deployed in another sample. Specifically, it is important to know how well tags picked from the HapMap DNA samples capture the variation in other samples. To address this, we collected dense data uniformly across the four HapMap population samples and eleven other population samples. We picked tag SNPs using genotype data we collected in the HapMap samples and then evaluated the effective coverage of these tags in comparison to the entire set of common variants observed in the other samples. We simulated case-control association studies in the non-HapMap samples under a disease model of modest risk, and we observed little loss in power. These results demonstrate that the HapMap DNA samples can be used to select tags for genome-wide association studies in many samples around the world.
Variation in PGR was associated with ovarian cancer risk, although the strongest result was not with the PROGINS allele. Instead, any causal allele(s) are likely in or downstream of block 4 and carried on haplotypes 4-D and 4-E. There was some evidence that the same variation was associated with a reduced risk of breast cancer, but the association was not statistically significant.
Identifying genetic variations predictive of important phenotypes, such as disease susceptibility, drug efficacy, and adverse events, remains a challenging task. There are individual polymorphisms that can be tested one at a time, but there is the more difficult problem of the identification of combinations of polymorphisms or even more complex interactions of genes with environmental factors. Diseases, drug responses or side effects can result from different mechanisms. Identification of subgroups of people where there is a common mechanism is a problem for diagnosis and prescribing of treatment. Recursive partitioning (RP) is a simple statistical tool for segmenting a population into non-overlapping groups where the response of interest, disease susceptibility, drug efficacy and adverse events are more homogeneous within the segments. We suggest that the use of RP is not only more technically feasible than other search methods but it is less susceptible to multiple-testing problems. The numbers of combinations of gene-gene and gene-environment interactions is potentially astronomical and RP greatly reduces the effective search and inference space. Moreover, the certain reliance of RP on the presence of marginal effects is justifiable as was found by using analytical and numerical arguments. In the context of haplotype analysis, results suggest that the analysis of individual SNPs is likely to be successful even when susceptibilities are determined by haplotypes. Retrospective clinical studies where cases and controls are collected will be a common design. This report provides methods that can be used to adjust the RP analysis to reflect the population incidence of the response of interest. Confidence limits on the incidence of the response in the segmented subgroups are also discussed. RP is a straightforward way to create realistic subgroups, and prediction intervals for the within-subgroup disease incidence are easily obtained.
Identifying genetic variation predictive of important phenotypes, including disease susceptibility, drug efficacy, and adverse events, is a challenging task, and theory and computer science work is being carried out in an attempt to tackle this issue. For many important diseases, such as diabetes, schizophrenia, and depression, the etiology is complex; either the disease is a result of several multiple mechanisms or is caused by an interaction among multiple genes or gene-environment interactions, or both. There is a need for statistical methods to deal with the large, complex data sets that will be used to disentangle these diseases. Each putative genetic polymorphism can be tested for association sequentially. The most difficult problem, however, is the identification of combinations of polymorphisms or genetic markers with increased predictive characteristics. Data from clinical trials, where patients with a particular disease are treated with certain drugs, can be retrospectively assembled using a case-control design. Such data will typically include treatment assignment, demographics, medical history, and genotypes for a large number of genetic markers. The number of variables in such data is expected to be much larger than the number of subjects. This report focuses on some of the methods being employed to deal with this complex data and covers, in some detail, a data-mining method--recursive partitioning--to analyze such data. The methods are demonstrated using a complex simulated data set, as there are few available public data sets. This explication of recursive partitioning should provide researchers with a better idea of the current available analysis techniques, in order to allow them to plan their experiments more effectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.