“… 1 , 2 Machine learning (ML) and deep learning (DL) methods facilitate metagenomics-based disease prediction and the discovery of consistent, replicable, and cross-cohort microbial biomarkers. 3 , 4 , 5 , 6 , 7 , 8 , 9 However, metagenomic data of individual clinical investigations are typical of low sample sizes (dozens-to-hundreds of samples), 3 , 4 , 10 high dimensionality (hundreds-to-thousands of microbes), 3 , 4 , 10 sparsity (sparsely distributed across taxonomic hierarchies), and high variations (biological and environmental). 11 These problems confound statistical inference and learning outcomes to random chances and false discoveries 12 and mask the identification of genuine biomarkers.…”