Periodontitis is a widespread chronic inflammatory disease caused by interactions between periodontal bacteria and homeostasis in the host. We aimed to investigate the performance and reliability of machine learning models in predicting the severity of chronic periodontitis. Mouthwash samples from 692 subjects (144 healthy controls and 548 generalized chronic periodontitis patients) were collected, the genomic DNA was isolated, and the copy numbers of nine pathogens were measured using multiplex qPCR. The nine pathogens are as follows: Porphyromonas gingivalis (Pg), Tannerella forsythia (Tf), Treponema denticola (Td), Prevotella intermedia (Pi), Fusobacterium nucleatum (Fn), Campylobacter rectus (Cr), Aggregatibacter actinomycetemcomitans (Aa), Peptostreptococcus anaerobius (Pa), and Eikenella corrodens (Ec). By adding the species one by one in order of high accuracy to find the optimal combination of input features, we developed an algorithm that predicts the severity of periodontitis using four machine learning techniques. The accuracy was the highest when the models classified “healthy” and “moderate or severe” periodontitis (H vs. M-S, average accuracy of four models: 0.93, AUC = 0.96, sensitivity of 0.96, specificity of 0.81, and diagnostic odds ratio = 112.75). One or two red complex pathogens were used in three models to distinguish slight chronic periodontitis patients from healthy controls (average accuracy of 0.78, AUC = 0.82, sensitivity of 0.71, and specificity of 0.84, diagnostic odds ratio = 12.85). Although the overall accuracy was slightly reduced, the models showed reliability in predicting the severity of chronic periodontitis from 45 newly obtained samples. Our results suggest that a well-designed combination of salivary bacteria can be used as a biomarker for classifying between a periodontally healthy group and a chronic periodontitis group.
Dental caries are one of the chronic diseases caused by organic acids made from oral microbes. However, there was a lack of knowledge about the oral microbiome of Korean children. The aim of this study was to analyze the metagenome data of the oral microbiome obtained from Korean children and to discover bacteria highly related to dental caries with machine learning models. Saliva and plaque samples from 120 Korean children aged below 12 years were collected. Bacterial composition was identified using Illumina HiSeq sequencing based on the V3–V4 hypervariable region of the 16S rRNA gene. Ten major genera accounted for approximately 70% of the samples on average, including Streptococcus, Neisseria, Corynebacterium, and Fusobacterium. Differential abundant analyses revealed that Scardovia wiggsiae and Leptotrichia wadei were enriched in the caries samples, while Neisseria oralis was abundant in the non-caries samples of children aged below 6 years. The caries and non-caries samples of children aged 6–12 years were enriched in Streptococcus mutans and Corynebacterium durum, respectively. The machine learning models based on these differentially enriched taxa showed accuracies of up to 83%. These results confirmed significant alterations in the oral microbiome according to dental caries and age, and these differences can be used as diagnostic biomarkers.
With recent advances in DNA sequencing technologies, fast acquisition of large-scale genomic data has become commonplace. For cancer studies, in particular, there is an increasing need for the classification of cancer type based on somatic alterations detected from sequencing analyses. However, the ever-increasing size and complexity of the data make the classification task extremely challenging. In this study, we evaluate the contributions of various input features, such as mutation profiles, mutation rates, mutation spectra and signatures, and somatic copy number alterations that can be derived from genomic data, and further utilize them for accurate cancer type classification. We introduce a novel ensemble of machine learning classifiers, called CPEM (Cancer Predictor using an Ensemble Model), which is tested on 7,002 samples representing over 31 different cancer types collected from The Cancer Genome Atlas (TCGA) database. We first systematically examined the impact of the input features. Features known to be associated with specific cancers had relatively high importance in our initial prediction model. We further investigated various machine learning classifiers and feature selection methods to derive the ensemble-based cancer type prediction model achieving up to 84% classification accuracy in the nested 10-fold cross-validation. Finally, we narrowed down the target cancers to the six most common types and achieved up to 94% accuracy.
Circulating tumor cells (CTCs) are known to be heterogeneous and clustered with tumor-associated cells, such as macrophages, neutrophils, fibroblasts, and platelets. However, their molecular profile and clinical significance remain largely unknown. Thus, we aimed to perform a comprehensive gene expression analysis of single CTCs and CTC clusters in patients with pancreatic cancer and to identify their potential clinical relevance to provide personalized medicine. Epitope-independent, rapid (>3 mL of whole blood/min) isolation of single CTCs and CTC clusters was achieved from a prospective cohort of 16 patients with unresectable pancreatic cancer using a centrifugal microfluidic device. Forty-eight mRNA expressions of individual CTCs and CTC clusters were analyzed to identify pancreatic CTC phenotype. CTC clusters had a larger proportion of mesenchymal expression than single CTCs (p = 0.0004). The presence of CTC clusters positively correlated with poor prognosis (progression-free survival, p = 0.0159; overall survival, p = 0.0186). Furthermore, we found that most CTCs in these patients (90.7%) were cloaked with platelets and found the presence of a positive correlation between the increase in CTC clusters and rapid disease progression during follow-ups. Efficient CTC cluster isolation and analysis techniques will enhance the understanding of complex tumor metastasis processes and can facilitate personalized disease management.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.