Abdominal aortic aneurysms (AAAs) are an important cardiovascular disease, but the genetic and environmental risk factors, which contribute to individual's risk to develop an aneurysm, are poorly understood. Histologically, AAAs are characterized by signs of chronic inflammation, destructive remodeling of the extracellular matrix, and depletion of vascular smooth muscle cells. We hypothesized that genes involved in these events could harbor changes that make individuals more susceptible to developing aneurysms. This study identified significant genetic associations between DNA sequence changes in tissue inhibitor of metalloproteinase 1 (TIMP1), TIMP3, matrix metalloproteinase 10 (MMP10) and elastin (ELN) genes, and AAA. The results will require confirmation using an independent set of samples. After replication it is possible that these sequence changes in combination with other risk factors could be used in the future to identify individuals who are at increased risk for developing an AAA.
1 Machine learning models often make predictions that bias against certain subgroups of input data. When undetected, machine learning biases can constitute significant financial and ethical implications. Semi-automated tools that involve humans in the loop could facilitate bias detection. Yet, little is known about the considerations involved in their design. In this paper, we report on an interview study with 11 machine learning practitioners for investigating the needs surrounding semi-automated bias detection tools. Based on the findings, we highlight four considerations in designing to guide system designers who aim to create future tools for bias detection.
The simultaneous testing of a large number of hypotheses in a genome scan, using individual thresholds for significance, inherently leads to inflated genome-wide false positive rates. There exist various approaches to approximating the correct genomewide p-values under various assumptions, either by way of asymptotics or simulations. We explore a philosophically different criterion, recently proposed in the literature, which controls the false discovery rate. The test statistics are assumed to arise from a mixture of distributions under the null and non-null hypotheses. We fit the mixture distribution using both a nonparametric approach and commingling analysis, and then apply the local false discovery rate to select cut-off points for regions to be declared interesting. Another criterion, the minimum total error, is also explored. Both criteria seem to be sensible alternatives to controlling the classical type I and type II error rates.
The basic idea of affected-sib-pair (ASP) linkage analysis is to test whether the inheritance pattern of a marker deviates from Mendelian expectation in a sample of ASPs. The test depends on an assumed Mendelian control distribution of the number of marker alleles shared identical by descent (IBD), i.e., 1/4, 1/2, and 1/4 for 2, 1, and 0 allele(s) IBD, respectively. However, Mendelian transmission may not always hold, for example because of inbreeding or meiotic drive at the marker or a nearby locus. A more robust and valid approach is to incorporate discordant-sib-pairs (DSPs) as controls to avoid possible false-positive results. To be robust to deviation from Mendelian transmission, here we analyzed Collaborative Study on the Genetics of Alcoholism data by modifying the ASP LOD score method to contrast the estimated distribution of the number of allele(s) shared IBD by ASPs with that by DSPs, instead of with the expected distribution under the Mendelian assumption. This strategy assesses the difference in IBD sharing between ASPs and the IBD sharing between DSPs. Further, it works better than the conventional LOD score ASP linkage method in these data in the sense of avoiding false-positive linkage evidence.
Clustering of related haplotypes in haplotype-based association mapping has the potential to improve power by reducing the degrees of freedom without sacrificing important information about the underlying genetic structure. We have modified a generalized linear model approach for association analysis by incorporating a density-based clustering algorithm to reduce the number of coefficients in the model. Using the GAW 15 Problem 3 simulated data, we show that our novel method can substantially enhance power to detect association with the binary rheumatoid arthritis (RA) phenotype at the HLA-DRB1 locus on chromosome 6. In contrast, clustering did not appreciably improve performance at locus D, perhaps a consequence of a rare susceptibility allele and of the overwhelming effect of HLA-DRB1/locus C, 5 cM distal. Optimization of parameters governing the clustering algorithm identified a set of parameters that delivered nearly ideal performance in a variety of situations. The cluster-based score test was valid over a wide range of haplotype diversity, and was robust to severe departures from Hardy-Weinberg equilibrium encountered near HLA-DRB1 in RA case-control samples.
Objective: p Values are inaccurate for model-free linkage analysis using the conditional logistic model if we assume that the LOD score is asymptotically distributed as a simple mixture of chi-square distributions. When analyzing affected relative pairs alone, permuting the allele sharing of relative pairs does not lead to a useful permutation distribution. As an alternative, we have developed regression prediction models that provide more accurate p values. Methods: Let Eα be the empirical p value, which is the proportion of statistical tests whose LOD score under the null hypothesis exceeds a threshold determined by α, the nominal single test significance value. We used simulated data to obtain values of Eα and compared them with α. We also developed a regression model, based on sample size, number of covariates in the model, α and marker density, to derive predicted p values for both single-point and multipoint analyses. To evaluate our predictions we used another set of simulated data, comparing the Eα for these data with those obtained by using the prediction model, referred to as predicted p values (Pα). Results: Under almost all circumstances the values of Pα were closer to the Eα than were the values of α. Conclusion: The regression models suggested by our analysis provide more accurate alternative p values for model-free linkage analysis when using the conditional logistic model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.