We carried out metagenomic shotgun sequencing and a metagenome-wide association study (MGWAS) of fecal, dental and salivary samples from a cohort of individuals with rheumatoid arthritis (RA) and healthy controls. Concordance was observed between the gut and oral microbiomes, suggesting overlap in the abundance and function of species at different body sites. Dysbiosis was detected in the gut and oral microbiomes of RA patients, but it was partially resolved after RA treatment. Alterations in the gut, dental or saliva microbiome distinguished individuals with RA from healthy controls, were correlated with clinical measures and could be used to stratify individuals on the basis of their response to therapy. In particular, Haemophilus spp. were depleted in individuals with RA at all three sites and negatively correlated with levels of serum autoantibodies, whereas Lactobacillus salivarius was over-represented in individuals with RA at all three sites and was present in increased amounts in cases of very active RA. Functionally, the redox environment, transport and metabolism of iron, sulfur, zinc and arginine were altered in the microbiota of individuals with RA. Molecular mimicry of human antigens related to RA was also detectable. Our results establish specific alterations in the gut and oral microbiomes in individuals with RA and suggest potential ways of using microbiome composition for prognosis and diagnosis.
While Distance Weighted Discrimination (DWD) is an appealing approach to classification in high dimensions, it was designed for balanced datasets. In the case of unequal costs, biased sampling, or unbalanced data, there are major improvements available, using appropriately weighted versions of DWD (wDWD). A major contribution of this paper is the development of optimal weighting schemes for various nonstandard classification problems. In addition, we discuss several alternative criteria and propose an adaptive weighting scheme (awDWD) and demonstrate its advantages over nonadaptive weighting schemes under some situations. The second major contribution is a theoretical study of weighted DWD. Both high-dimensional low sample-size asymptotics and Fisher consistency of DWD are studied. The performance of weighted DWD is evaluated using simulated examples and two real data examples. The theoretical results are also confirmed by simulations.
In multicategory classification, standard techniques typically treat all classes equally. This treatment can be problematic when the dataset is unbalanced in the sense that certain classes have very small class proportions compared to others. The minority classes may be ignored or discounted during the classification process due to their small proportions. This can be a serious problem if those minority classes are important. In this article, we study the problem of unbalanced classification and propose new criteria to measure classification accuracy. Moreover, we propose three different weighted learning procedures, two one-step weighted procedures, as well as one adaptive weighted procedure. We demonstrate the advantages of the new procedures, using multicategory support vector machines, through simulated and real datasets. Our results indicate that the proposed methodology can handle unbalanced classification problems effectively.
The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this paper, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification * Correspondence to Guang Cheng (e-mail: chengg@purdue.edu method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.