Fast, lightweight, and accurate metagenomic functional profiling using FracMinHash sketches

Rahman Hera, Mahmudur; Liu, Shaopeng; Wei, Wei; S. Rodriguez, Judith; Ma, Chunyu; Koslicki, David

doi:10.1101/2023.11.06.565843

Cited by 5 publications

(4 citation statements)

References 100 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“… [249] BBMap [250] , Bowtie2 [251] , [252] , BWA [253] , [254] , [255] , iMOKA [256] , MiniMap2 [71] Taxonomic Classification In sequence composition-based methods, the frequency and distribution of k-mers in metagenomic data are analyzed to assess genome similarity across various taxonomic ranks. [257] , [258] ARK [259] , BinDash [122] , Bracken [260] , CDKAM [261] , CLARK [262] , Dashing [124] , fmh-funprofiler [128] , Genometa [263] , Kaiju [138] , KMCP [264] , KmerFinder [265] , Kraken2 [136] , KrakenUniq [19] , LMAT [266] , Mash [72] , Mash Screen [34] , Matchtigs [267] , MetaCache [268] , MetaPalette [269] , MetaProFi [50] , NIQKI [126] , SEK [270] , StrainSeeker [271] , SuperSampler [127] , TACOA [272] , Taxonomer [273] , TETRA [274] , VirFinder [18] , WGSQuikr [275] Phylogeny Reconstruction Pairwise evolutionary distances between protein or nucleic acid sequences and phylogenetic distances can be estimated from the number of k-mer matches between two sequences. Alignment-free sequence comparison quantifies distance using the decay of the number of k-mer matches between two sequences and compares the results to known phylogenetic trees.…”

Section: Applications Of K-mersmentioning

confidence: 99%

“…However, Kmer-db is roughly 26 times faster than Mash and is subsequently better equipped to process larger datasets [121] . Additional sketch-based methods that utilize k-mers in comparative genomics include Bindash 1.0 [122] and 2.0 [123] , Dashing 1.0 [124] and 2.0 [125] , NIQKI [126] , SuperSampler [127] , and fmh-funprofiler [128] ( Table 2 ).…”

Section: Applications Of K-mersmentioning

confidence: 99%

See 1 more Smart Citation

A survey of k-mer methods and applications in bioinformatics

Moeckel,

Mareboina,

Konnaris

et al. 2024

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

Section: Applications Of K-mersmentioning

confidence: 99%

Section: Applications Of K-mersmentioning

confidence: 99%

A survey of k-mer methods and applications in bioinformatics

Moeckel,

Mareboina,

Konnaris

et al. 2024

Computational and Structural Biotechnology Journal

View full text Add to dashboard Cite

“…Each of these samples were converted into a functional profile in the form of a probability vector indexed by KOs, representing the abundances of the KOs in the sample. This is done using FracMinHash, [9] a sketch-based pipeline that uses sourmash methods to estimate the abundance of each KO present in each sample. The details for this process can be found in Supplement Section 3.2.…”

Section: Functional Comparison Among Body Sitesmentioning

confidence: 99%

“…This procedure can be adapted with the help of branch lengths assignment to answer a different yet equally meaningful question in metagenomic studies: the difference in functions that microbial communities are capable of performing in two given environments regardless of the actual organisms that carry out those functions. To do so, instead of clustering DNA into OTUs, we clustered them into functional units of orthologous genes through a process called functional profiling [9]. Next, instead of a phylogenetic tree, we obtained the KEGG Orthology (KO) hierarchy from the KEGG database [13,11,12].…”

Section: Introductionmentioning

confidence: 99%

On branch lengths assignment methods for trees with fixed topology and related biological applications

Wei,

Koslicki

2024

Preprint

View full text Add to dashboard Cite

Distance-guided tree construction with unknown tree topology and branch lengths has been a long studied problem. In contrast, distance-guided branch lengths assignment with fixed tree topology has not yet been systematically investigated, despite having significant applications. In this paper, we provide a formal mathematical formulation of this problem and propose two representative methods for solving this problem, each with its own strength. We evaluate the performance of these two methods under various settings using simulated data, providing guidance for the choice of methods in respective cases. We demonstrate a practical application of this operation through an extension we termed FunUniFrac, which quantifies the differences in functional units between metagenomic samples over a functional tree with assigned branch lengths, allowing clustering of metagenomic samples by functional similarity instead of taxonomic similarity in traditional methods, thus expanding the realm of comparative studies in metagenomics.

show abstract

Microbiodiversity Landscape Present in the Mine-Tailings of the “Sierra de Huautla” Biosphere Reserve, Mexico

Fernández-López,

Sánchez-Reyes,

Rosas-Ramírez

et al. 2024

Water Air Soil Pollut

View full text Add to dashboard Cite

Large-scale mining activities generate significant amounts of waste that accumulates in the environment. These wastes, known as mine tailings, contain high levels of heavy metals, posing risks to human health and causing severe damage to ecosystems. In this study, we determined the heavy metal content of mine tailings in the Sierra de Huautla Biosphere Reserve (REBIOSH), Mexico, and investigated their effect on microbial composition. One of the sites historically contaminated with metals was sampled in three different locations, labeled S1, S2, and S3. A fourth site free of heavy metals (S4) was also used as a control. Our results showed high levels of As, Pb, Cd, and Ag, potentially dangerous metals that exceed thresholds set by international regulatory agencies. Metal contamination indices indicated moderate to extreme enrichment for As, Cd, and Pb, posing potential ecological risks. A metagenomic study of mine tailings showed a core specie-specific microbiome covered by Sinimarinibacterium flocculans, Jiangella anatolica, Thiobacillus denitrificans, Fontimonas thermophile, Sphingomonas koreensis. These may be associated with the processing of heavy metals. A comparative study using the ALDEx2 revealed that less represented species like Variovorax paradoxus, Usitatibacter rugosus, Usitatibacter palustris, Sphingosinicella microcystinivorans, Sphingobium yanoikuyae, and Stella humosa may serve as microbial markers in metal-contaminated environments. In addition, we detected rare or low-abundance species belonging to the phylum Armatimonadota, Candidatus Melainobacteriota, Candidatus Saccharimonadota, Chlamydiota, Deinococcota, Elusimicrobiota, Bacillota, Rhodothermota and Verrucomicrobiota, which could play an important role in ecosystems contaminated with heavy metals. Also, we found site-specific taxonomic representatives such as Nitrososphaera gargensis and Nitrospira nitrificans dominating the S3 ecosystem; Ensifer aridi (S2 and S1), N. nitrificans (S2), while Reyranella soli dominate the S1 soil. These organisms could be crucial for nitrogen access in oligotrophic environments and underscore the adaptability of microbial life to extreme conditions. This is the first comprehensive study of the microbial composition in this important ecological site of the Mexican geography and can provide future guidance for the management and biological treatment of mining wastes.

show abstract

Fast, lightweight, and accurate metagenomic functional profiling using FracMinHash sketches

Cited by 5 publications

References 100 publications

A survey of k-mer methods and applications in bioinformatics

A survey of k-mer methods and applications in bioinformatics

On branch lengths assignment methods for trees with fixed topology and related biological applications

Microbiodiversity Landscape Present in the Mine-Tailings of the “Sierra de Huautla” Biosphere Reserve, Mexico

Contact Info

Product

Resources

About