Exploring Data From Genetic Association Studies Using Bayesian Variable Selection and the Dirichlet Process: Application to Searching for Gene × Gene Patterns

Papathomas, Michail; Molitor, John; Hoggart, Clive; Hastie, David I.; Richardson, Sylvia

doi:10.1002/gepi.21661

Cited by 37 publications

(43 citation statements)

References 44 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Among the other papers, DPM models with Gaussian kernels are used to cluster microarray gene expression data [28,29,30]. Our approach differs from the previous papers since SNP genotypes take only three possible values and thus we consider a multinomial mixture model [31,32]. It is worth mentioning that our goal is very similar to the one in [31], although with a different approach.…”

Section: Introductionmentioning

confidence: 99%

“…Our approach differs from the previous papers since SNP genotypes take only three possible values and thus we consider a multinomial mixture model [31,32]. It is worth mentioning that our goal is very similar to the one in [31], although with a different approach. They clustered individuals in groups (e.g., high risk, average risk and low risk for a certain disease) and then identified the covariates which were influent in clustering with DPM.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Bayesian nonparametric clustering and association studies for candidate SNP observations

Wang

Ruggeri

Hsiao

et al. 2017

International Journal of Approximate Reasoning

View full text Add to dashboard Cite

Clustering is often considered as the first step in the analysis when dealing with an enormous amount of Single Nucleotide Polymorphism (SNP) genotype data. The lack of biological information could affect the outcome of such procedure. Even if a clustering procedure has been selected and performed, the impact of its uncertainty on the subsequent association analysis is rarely assessed. In this research we propose first a model to cluster SNPs data, then we assess the association between the cluster and a disease. In particular, we adopt a Dirichlet process mixture model with the advantages, with respect to the usual clustering methods, that the number of clusters needs not to be known and fixed in advance and the variation in the assignment of SNPs to clusters can be accounted. In addition, once a clustering of SNPs is obtained, we design an individualized genetic score quantifying the SNP composition in each cluster for every subject, so that we can set up a generalized linear model for association analysis able to incorporate the information from a large-scale SNP dataset, and yet with a much smaller number of explanatory variables. The inference on cluster allocation, the strength of association of each cluster (the collective effect on SNPs in the same cluster), and the susceptibility of each SNP are based on posterior samples from Markov chain Monte Carlo methods and the Binder loss information. We exemplify this Bayesian nonparametric strategy in a genome-wide association study of Crohn's * Corresponding author.Email address: raffaele@mi.imati.cnr.it (Raffaele Argiento)Preprint submitted to International Journal of Approximate Reasoning July 16, 2016 disease in a case-control setting.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Bayesian nonparametric clustering and association studies for candidate SNP observations

Wang

Ruggeri

Hsiao

et al. 2017

International Journal of Approximate Reasoning

View full text Add to dashboard Cite

show abstract

“…The variable selection options, which are comprised of either a binary [37] or continuous [57] selection weighting methods, allows the model to exclude an exposure from influencing the clustering procedure if an exposure exhibits a very low probability of being involved in the clustering patterns, further emphasizing a data-driven (non-parametric) approach to clustering. Specifically, we implemented variable selection with the “Continuous” option which utilizes a latent variable taking on values in (0,1) which informs the contribution of the variable in question in supporting a mixture distribution [53,57]. Using a Bayesian framework for variable selection has been shown to be particularly helpful within the context of a large number of correlated covariates because it appropriately handles model uncertainty [58,59].…”

Section: Methodsmentioning

confidence: 99%

Association between Pesticide Profiles Used on Agricultural Fields near Maternal Residences during Pregnancy and IQ at Age 7 Years

Coker

Gunier

Bradman

et al. 2017

IJERPH

Self Cite

View full text Add to dashboard Cite

We previously showed that potential prenatal exposure to agricultural pesticides was associated with adverse neurodevelopmental outcomes in children, yet the effects of joint exposure to multiple pesticides is poorly understood. In this paper, we investigate associations between the joint distribution of agricultural use patterns of multiple pesticides (denoted as “pesticide profiles”) applied near maternal residences during pregnancy and Full-Scale Intelligence Quotient (FSIQ) at 7 years of age. Among a cohort of children residing in California’s Salinas Valley, we used Pesticide Use Report (PUR) data to characterize potential exposure from use within 1 km of maternal residences during pregnancy for 15 potentially neurotoxic pesticides from five different chemical classes. We used Bayesian profile regression (BPR) to examine associations between clustered pesticide profiles and deficits in childhood FSIQ. BPR identified eight distinct clusters of prenatal pesticide profiles. Two of the pesticide profile clusters exhibited some of the highest cumulative pesticide use levels and were associated with deficits in adjusted FSIQ of −6.9 (95% credible interval: −11.3, −2.2) and −6.4 (95% credible interval: −13.1, 0.49), respectively, when compared with the pesticide profile cluster that showed the lowest level of pesticides use. Although maternal residence during pregnancy near high agricultural use of multiple neurotoxic pesticides was associated with FSIQ deficit, the magnitude of the associations showed potential for sub-additive effects. Epidemiologic analysis of pesticides and their potential health effects can benefit from a multi-pollutant approach to analysis.

show abstract

“…A natural extension of such approaches would be to incorporate additional outcome data, i.e., to use a joint model of features and response in a semi-supervised manner, rather than proceed sequentially with clustering first, then by linking clusters with survival outcome as presented in the METABRIC paper. In the genetic epidemiology context, Papathomas et al (2012) used a joint clustering of genes and lung cancer outcomes to explore potential for gene-gene interactions. They adopt a non-parametric Bayesian approach referred to as profile regression (Molitor et al 2010), which also allows the selection of the important features that drive the clustering.…”

Section: Vertical Data Integrationmentioning

confidence: 99%

Statistical Methods in Integrative Genomics

Richardson

Tseng

Sun

2016

Annu. Rev. Stat. Appl.

Self Cite

View full text Add to dashboard Cite

Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions.

show abstract

Exploring Data From Genetic Association Studies Using Bayesian Variable Selection and the Dirichlet Process: Application to Searching for Gene × Gene Patterns

Cited by 37 publications

References 44 publications

Bayesian nonparametric clustering and association studies for candidate SNP observations

Bayesian nonparametric clustering and association studies for candidate SNP observations

Association between Pesticide Profiles Used on Agricultural Fields near Maternal Residences during Pregnancy and IQ at Age 7 Years

Statistical Methods in Integrative Genomics

Contact Info

Product

Resources

About