The adaptive immune receptor repertoire consists of the entire set of an individual’s BCRs and TCRs and is believed to contain a record of prior immune responses and the potential for future immunity. Analyses of TCR repertoires via deep learning (DL) methods have successfully diagnosed cancers and infectious diseases, including coronavirus disease 2019. However, few studies have used DL to analyze BCR repertoires. In this study, we collected IgG H chain Ab repertoires from 276 healthy control subjects and 326 patients with various infections. We then extracted a comprehensive feature set consisting of 10 subsets of repertoire-level features and 160 sequence-level features and tested whether these features can distinguish between infected individuals and healthy control subjects. Finally, we developed an ensemble DL model, namely, DL method for infection diagnosis (https://github.com/chenyuan0510/DeepID), and used this model to differentiate between the infected and healthy individuals. Four subsets of repertoire-level features and four sequence-level features were selected because of their excellent predictive performance. The DL method for infection diagnosis outperformed traditional machine learning methods in distinguishing between healthy and infected samples (area under the curve = 0.9883) and achieved a multiclassification accuracy of 0.9104. We also observed differences between the healthy and infected groups in V genes usage, clonal expansion, the complexity of reads within clone, the physical properties in the α region, and the local flexibility of the CDR3 amino acid sequence. Our results suggest that the Ab repertoire is a promising biomarker for the diagnosis of various infections.
Since the decision trees (DTs) have an advantage over "black-box" models, such as neural nets or support vector machines, in terms of comprehensibility, such that it might merit improvement for further optimization. The node splitting measures and pruning methods are primary among the techniques that can improve the generalization abilities of DTs. Here, we introduced the unequal interval optimization for node splitting, as well as the local chi-square test for tree pruning. This new method was named an adaptive multibranch decision tree (CMDT). 11 benchmark data sets with different scales were chosen from UCI Machine Learning Repository and coupled with 12 classifiers to evaluate the CMDT algorithm. The results showed that CMDT can be more reliable than the twelve comparative approaches, especially for imbalanced datasets. We also discussed the performance metrics and the weighted decision-making table in unbalanced data sets. The CMDT algorithm can be found here: https://github.com/chenyuan0510/CMDT. INDEX TERMS decision tree, node splitting, Chi-MIC, CMDT, pruning methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.