An approximately unbiased (AU) test that uses a newly devised multiscale bootstrap technique was developed for general hypothesis testing of regions in an attempt to reduce test bias. It was applied to maximum-likelihood tree selection for obtaining the confidence set of trees. The AU test is based on the theory of Efron et al. (Proc. Natl. Acad. Sci. USA 93:13429-13434; 1996), but the new method provides higher-order accuracy yet simpler implementation. The AU test, like the Shimodaira-Hasegawa (SH) test, adjusts the selection bias overlooked in the standard use of the bootstrap probability and Kishino-Hasegawa tests. The selection bias comes from comparing many trees at the same time and often leads to overconfidence in the wrong trees. The SH test, though safe to use, may exhibit another type of bias such that it appears conservative. Here I show that the AU test is less biased than other methods in typical cases of tree selection. These points are illustrated in a simulation study as well as in the analysis of mammalian mitochondrial protein sequences. The theoretical argument provides a simple formula that covers the bootstrap probability test, the Kishino-Hasegawa test, the AU test, and the Zharkikh-Li test. A practical suggestion is provided as to which test should be used under particular circumstances.
Pvclust is an add-on package for a statistical software R to assess the uncertainty in hierarchical cluster analysis. Pvclust can be used easily for general statistical problems, such as DNA microarray analysis, to perform the bootstrap analysis of clustering, which has been popular in phylogenetic analysis. Pvclust calculates probability values (p-values) for each cluster using bootstrap resampling techniques. Two types of p-values are available: approximately unbiased (AU) p-value and bootstrap probability (BP) value. Multiscale bootstrap resampling is used for the calculation of AU p-value, which has superiority in bias over BP value calculated by the ordinary bootstrap resampling. In addition the computation time can be enormously decreased with parallel computing option.
To construct an East Asia mitochondrial DNA (mtDNA) phylogeny, we sequenced the complete mitochondrial genomes of 672 Japanese individuals (http://www.giib.or.jp/mtsnp/index_e.html). This allowed us to perform a phylogenetic analysis with a pool of 942 Asiatic sequences. New clades and subclades emerged from the Japanese data. On the basis of this unequivocal phylogeny, we classified 4713 Asian partial mitochondrial sequences, with <10% ambiguity. Applying population and phylogeographic methods, we used these sequences to shed light on the controversial issue of the peopling of Japan. Population-based comparisons confirmed that present-day Japanese have their closest genetic affinity to northern Asian populations, especially to Koreans, which finding is congruent with the proposed Continental gene flow to Japan after the Yayoi period. This phylogeographic approach unraveled a high degree of differentiation in Paleolithic Japanese. Ancient southern and northern migrations were detected based on the existence of basic M and N lineages in Ryukyuans and Ainu. Direct connections with Tibet, parallel to those found for the Y-chromosome, were also apparent. Furthermore, the highest diversity found in Japan for some derived clades suggests that Japan could be included in an area of migratory expansion to Continental Asia. All the theories that have been proposed up to now to explain the peopling of Japan seem insufficient to accommodate fully this complex picture
Approximately unbiased tests based on bootstrap probabilities are considered for the exponential family of distributions with unknown expectation parameter vector, where the null hypothesis is represented as an arbitrary-shaped region with smooth boundaries. This problem has been discussed previously in Efron and Tibshirani [Ann. Statist. 26 (1998) 1687-1718, and a corrected p-value with second-order asymptotic accuracy is calculated by the two-level bootstrap of Efron, Halloran and Holmes [Proc. Natl. Acad. Sci. U.S. A. 93 (1996) 13429-13434] based on the ABC bias correction of Efron [J. Amer. Statist. Assoc. 82 (1987) 171-185]. Our argument is an extension of their asymptotic theory, where the geometry, such as the signed distance and the curvature of the boundary, plays an important role. We give another calculation of the corrected p-value without finding the "nearest point" on the boundary to the observation, which is required in the two-level bootstrap and is an implementational burden in complicated problems. The key idea is to alter the sample size of the replicated dataset from that of the observed dataset. The frequency of the replicates falling in the region is counted for several sample sizes, and then the p-value is calculated by looking at the change in the frequencies along the changing sample sizes. This is the multiscale bootstrap of Shimodaira [Systematic Biology 51 (2002) 492-508], which is third-order accurate for the multivariate normal model. Here we introduce a newly devised multistep-multiscale bootstrap, calculating a third-order accurate pvalue for the exponential family of distributions. In fact, our p-value is asymptotically equivalent to those obtained by the double bootstrap of Hall [The Bootstrap and Edgeworth Expansion (1992) Springer, New York] and the modified signed likelihood ratio of Barndorff-Nielsen [Biometrika 73 (1986) 307-322] ignoring O(n −3/2 ) terms, yet the computation is less demanding and free from model specification.
We have searched for intermediate-scale anisotropy in the arrival directions of ultrahigh-energy cosmic rays with energies above 57 EeV in the northern sky using data collected over a 5 yr period by the surface detector of the Telescope Array experiment. We report on a cluster of events that we call the hotspot, found by oversampling using 20 • radius circles. The hotspot has a Li-Ma statistical significance of 5.1σ , and is centered at R.A. = 146. • 7, decl. = 43. • 2. The position of the hotspot is about 19 • off of the supergalactic plane. The probability of a cluster of events of 5.1σ significance, appearing by chance in an isotropic cosmic-ray sky, is estimated to be 3.7 × 10 −4 (3.4σ).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.