BackgroundMacrosatellite repeats (MSRs), usually spanning hundreds of kilobases of genomic DNA, comprise a significant proportion of the human genome. Because of their highly polymorphic nature, MSRs represent an extreme example of copy number variation, but their structure and function is largely understudied. Here, we describe a detailed study of six autosomal and two X chromosomal MSRs among 270 HapMap individuals from Central Europe, Asia and Africa. Copy number variation, stability and genetic heterogeneity of the autosomal macrosatellite repeats RS447 (chromosome 4p), MSR5p (5p), FLJ40296 (13q), RNU2 (17q) and D4Z4 (4q and 10q) and X chromosomal DXZ4 and CT47 were investigated.ResultsRepeat array size distribution analysis shows that all of these MSRs are highly polymorphic with the most genetic variation among Africans and the least among Asians. A mitotic mutation rate of 0.4-2.2% was observed, exceeding meiotic mutation rates and possibly explaining the large size variability found for these MSRs. By means of a novel Bayesian approach, statistical support for a distinct multimodal rather than a uniform allele size distribution was detected in seven out of eight MSRs, with evidence for equidistant intervals between the modes.ConclusionsThe multimodal distributions with evidence for equidistant intervals, in combination with the observation of MSR-specific constraints on minimum array size, suggest that MSRs are limited in their configurations and that deviations thereof may cause disease, as is the case for facioscapulohumeral muscular dystrophy. However, at present we cannot exclude that there are mechanistic constraints for MSRs that are not directly disease-related. This study represents the first comprehensive study of MSRs in different human populations by applying novel statistical methods and identifies commonalities and differences in their organization and function in the human genome.
Strategic choices for efficient and accurate evaluation of marginal likelihoods by means of Monte Carlo simulation methods are studied for the case of highly non-elliptical posterior distributions. A comparative analysis is presented of possible advantages and limitations of different simulation techniques; of possible choices of candidate distributions and choices of target or warped target distributions; and finally of numerical standard errors. The importance of a robust and flexible estimation strategy is demonstrated where the complete posterior distribution is explored. Given an appropriately yet quickly tuned adaptive candidate, straightforward importance sampling provides a computationally efficient estimator of the marginal likelihood (and a reliable and easily computed corresponding numerical standard error) in the cases investigated in this paper, which include a non-linear regression model and a mixture GARCH model. Warping the posterior density can lead to a further gain in efficiency, but it is more important that the posterior kernel is appropriately wrapped by the candidate distribution than that is warped.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.