Many stochastic simulation approaches for generating observations from a posterior distribution depend on knowing a likelihood function. However, for many complex probability models, such likelihoods are either impossible or computationally prohibitive to obtain. Here we present a Markov chain Monte Carlo method for generating observations from a posterior distribution without the use of likelihoods. It can also be used in frequentist applications, in particular for maximum-likelihood estimation. The approach is illustrated by an example of ancestral inference in population genetics. A number of open problems are highlighted in the discussion.O ne of the basic problems in Bayesian statistics is the computation of posterior distributions. We imagine data D generated from a model M determined by parameters , the prior density of which is denoted by ( ). We assume unless otherwise stated that the data are discrete. The posterior distribution of interest is f( ͉D), which is given bywhere (ސD) ϭ ͐ (ސD͉ ) ( )d is the normalizing constant. In most scientific contexts, explicit formulae for such posterior densities are few and far between, and we usually resort to stochastic simulation to generate observations from f. Perhaps the simplest approach for this is the rejection method: A1. Generate from (⅐). A2. Accept with probability h ϭ (ސD͉ ); return to A1. There are many variations on this theme. Of particular relevance here is the case in which the likelihood (ސD͉ ) cannot be computed explicitly. One obvious approach then is:The success of this approach depends on the fact that the underlying stochastic model M is easy to simulate. This approach can be useful when computation of the likelihood is possible but time-consuming.The practicality of algorithms such as these depends crucially on the size of (ސD), because the probability of accepting an observation is proportional to (ސD). In cases where the acceptance rate is too small, one might resort to approximate methods such as: This approach requires selection of a suitable metric as well as a choice of . As 3 ϱ it generates observations from the prior. If ϭ 0, an observation DЈ is accepted only if DЈ ϭ D, and then accepted observations come from the density f( ͉D). The choice of therefore reflects a tension between computability and accuracy. The method is still honest in that, for a given and , we are generating independent and identically distributed observations from f( ͉ (D, DЈ) Յ ).When D is high-dimensional or continuous, this approach can be impractical as well, and then the comparison of DЈ with D can be made by using lower-dimensional summaries of the data. The motivation for this approach is that if the set of statistics S ϭ (S 1 , . . . , S p ) is sufficient for , in that (ސD͉S, ) is independent of , then f( ͉D) ϭ f( ͉S). The normalizing constant (ސS) is typically larger than (ސD), resulting in more acceptances. In practice it will be hard, if not impossible, to identity a suitable set of sufficient statistics, and we then might resort to ...
Motivation: Recently there has been increasing interest in the effects of cell mixture on the measurement of DNA methylation, specifically the extent to which small perturbations in cell mixture proportions can register as changes in DNA methylation. A recently published set of statistical methods exploits this association to infer changes in cell mixture proportions, and these methods are presently being applied to adjust for cell mixture effect in the context of epigenome-wide association studies. However, these adjustments require the existence of reference datasets, which may be laborious or expensive to collect. For some tissues such as placenta, saliva, adipose or tumor tissue, the relevant underlying cell types may not be known.Results: We propose a method for conducting epigenome-wide association studies analysis when a reference dataset is unavailable, including a bootstrap method for estimating standard errors. We demonstrate via simulation study and several real data analyses that our proposed method can perform as well as or better than methods that make explicit use of reference datasets. In particular, it may adjust for detailed cell type differences that may be unavailable even in existing reference datasets.Availability and implementation: Software is available in the R package RefFreeEWAS. Data for three of four examples were obtained from Gene Expression Omnibus (GEO), accession numbers GSE37008, GSE42861 and GSE30601, while reference data were obtained from GEO accession number GSE39981.Contact: andres.houseman@oregonstate.eduSupplementary information: Supplementary data are available at Bioinformatics online.
There is currently tremendous interest in the possibility of using genome-wide association mapping to identify genes responsible for natural variation, particularly for human disease susceptibility. The model plant Arabidopsis thaliana is in many ways an ideal candidate for such studies, because it is a highly selfing hermaphrodite. As a result, the species largely exists as a collection of naturally occurring inbred lines, or accessions, which can be genotyped once and phenotyped repeatedly. Furthermore, linkage disequilibrium in such a species will be much more extensive than in a comparable outcrossing species. We tested the feasibility of genome-wide association mapping in A. thaliana by searching for associations with flowering time and pathogen resistance in a sample of 95 accessions for which genome-wide polymorphism data were available. In spite of an extremely high rate of false positives due to population structure, we were able to identify known major genes for all phenotypes tested, thus demonstrating the potential of genome-wide association mapping in A. thaliana and other species with similar patterns of variation. The rate of false positives differed strongly between traits, with more clinal traits showing the highest rate. However, the false positive rates were always substantial regardless of the trait, highlighting the necessity of an appropriate genomic control in association studies.
An age-dependent association between variation at the FTO locus and BMI in children has been suggested. We meta-analyzed associations between the FTO locus (rs9939609) and BMI in samples, aged from early infancy to 13 years, from 8 cohorts of European ancestry. We found a positive association between additional minor (A) alleles and BMI from 5.5 years onwards, but an inverse association below age 2.5 years. Modelling median BMI curves for each genotype using the LMS method, we found that carriers of minor alleles showed lower BMI in infancy, earlier adiposity rebound (AR), and higher BMI later in childhood. Differences by allele were consistent with two independent processes: earlier AR equivalent to accelerating developmental age by 2.37% (95% CI 1.87, 2.87, p = 10−20) per A allele and a positive age by genotype interaction such that BMI increased faster with age (p = 10−23). We also fitted a linear mixed effects model to relate genotype to the BMI curve inflection points adiposity peak (AP) in infancy and AR. Carriage of two minor alleles at rs9939609 was associated with lower BMI at AP (−0.40% (95% CI: −0.74, −0.06), p = 0.02), higher BMI at AR (0.93% (95% CI: 0.22, 1.64), p = 0.01), and earlier AR (−4.72% (−5.81, −3.63), p = 10−17), supporting cross-sectional results. Overall, we confirm the expected association between variation at rs9939609 and BMI in childhood, but only after an inverse association between the same variant and BMI in infancy. Patterns are consistent with a shift on the developmental scale, which is reflected in association with the timing of AR rather than just a global increase in BMI. Results provide important information about longitudinal gene effects and about the role of FTO in adiposity. The associated shifts in developmental timing have clinical importance with respect to known relationships between AR and both later-life BMI and metabolic disease risk.
Standard regression analyses are often plagued with problems encountered when one tries to make inference going beyond main effects using data sets that contain dozens of variables that are potentially correlated. This situation arises, for example, in epidemiology where surveys or study questionnaires consisting of a large number of questions yield a potentially unwieldy set of interrelated data from which teasing out the effect of multiple covariates is difficult. We propose a method that addresses these problems for categorical covariates by using, as its basic unit of inference, a profile formed from a sequence of covariate values. These covariate profiles are clustered into groups and associated via a regression model to a relevant outcome. The Bayesian clustering aspect of the proposed modeling framework has a number of advantages over traditional clustering approaches in that it allows the number of groups to vary, uncovers subgroups and examines their association with an outcome of interest, and fits the model as a unit, allowing an individual's outcome potentially to influence cluster membership. The method is demonstrated with an analysis of survey data obtained from the National Survey of Children's Health. The approach has been implemented using the standard Bayesian modeling software, WinBUGS, with code provided in the supplementary material available at Biostatistics online. Further, interpretation of partitions of the data is helped by a number of postprocessing tools that we have developed.
BackgroundThe question of whether air pollution contributes to asthma onset remains unresolved.ObjectivesIn this study, we assessed the association between asthma onset in children and traffic-related air pollution.MethodsWe selected a sample of 217 children from participants in the Southern California Children’s Health Study, a prospective cohort designed to investigate associations between air pollution and respiratory health in children 10–18 years of age. Individual covariates and new asthma incidence (30 cases) were reported annually through questionnaires during 8 years of follow-up. Children had nitrogen dioxide monitors placed outside their home for 2 weeks in the summer and 2 weeks in the fall–winter season as a marker of traffic-related air pollution. We used multilevel Cox models to test the associations between asthma and air pollution.ResultsIn models controlling for confounders, incident asthma was positively associated with traffic pollution, with a hazard ratio (HR) of 1.29 [95% confidence interval (CI), 1.07–1.56] across the average within-community interquartile range of 6.2 ppb in annual residential NO2. Using the total interquartile range for all measurements of 28.9 ppb increased the HR to 3.25 (95% CI, 1.35–7.85).ConclusionsIn this cohort, markers of traffic-related air pollution were associated with the onset of asthma. The risks observed suggest that air pollution exposure contributes to new-onset asthma.
Recent genome-wide association (GWA) studies have identified dozens of common variants associated with adult height. However, it is unknown how these variants influence height growth during childhood. We derived peak height velocity in infancy (PHV1) and puberty (PHV2) and timing of pubertal height growth spurt from parametric growth curves fitted to longitudinal height growth data to test their association with known height variants. The study consisted of N = 3,538 singletons from the prospective Northern Finland Birth Cohort 1966 with genotype data and frequent height measurements (on average 20 measurements per person) from 0–20 years. Twenty-six of the 48 variants tested associated with adult height (p<0.05, adjusted for sex and principal components) in this sample, all in the same direction as in previous GWA scans. Seven SNPs in or near the genes HHIP, DLEU7, UQCC, SF3B4/SV2A, LCORL, and HIST1H1D associated with PHV1 and five SNPs in or near SOCS2, SF3B4/SV2A, C17orf67, CABLES1, and DOT1L with PHV2 (p<0.05). We formally tested variants for interaction with age (infancy versus puberty) and found biologically meaningful evidence for an age-dependent effect for the SNP in SOCS2 (p = 0.0030) and for the SNP in HHIP (p = 0.045). We did not have similar prior evidence for the association between height variants and timing of pubertal height growth spurt as we had for PHVs, and none of the associations were statistically significant after correction for multiple testing. The fact that in this sample, less than half of the variants associated with adult height had a measurable effect on PHV1 or PHV2 is likely to reflect limited power to detect these associations in this dataset. Our study is the first genetic association analysis on longitudinal height growth in a prospective cohort from birth to adulthood and gives grounding for future research on the genetic regulation of human height during different periods of growth.
The relationship of bronchitic symptoms to ambient particulate matter and to particulate elemental and organic carbon (OC), nitrogen dioxide (NO2), and other gaseous pollutants was examined in a cohort of children with asthma in 12 Southern California communities. Symptoms, assessed yearly by questionnaire from 1996 to 1999, were associated with the yearly variability of particulate matter with aerodynamic diameter less than 2.5 microg (odds ratio [OR] 1.09/microg/m3; 95% confidence interval [CI] 1.01-1.17), OC (OR 1.41/microg/m3; 95% CI 1.12-1.78), NO2 (OR 1.07/ppb; 95% CI 1.02-1.13), and ozone (OR 1.06/ppb; 95% CI 1.00-1.12). The ORs associated with yearly within-community variability in air pollution were larger than the effect of the between-community 4-year average concentrations. In two pollutant models, the effects of yearly variation in OC and NO2 were only modestly reduced by adjusting for other pollutants, except in a model containing both OC and NO2; the effects of all other pollutants were reduced after adjusting for OC or NO2. We conclude that OC and NO2 deserve greater attention as potential causes of the chronic symptoms of bronchitis in children with asthma and that previous cross-sectional studies may have underestimated the risks associated with air pollution.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.