Jonas Coelho Kasmanas scite author profile

Metagenomics became a standard strategy to comprehend the functional potential of microbial communities, including the human microbiome. Currently, the number of metagenomes in public repositories is increasing exponentially. The Sequence Read Archive (SRA) and the MG-RAST are the two main repositories for metagenomic data. These databases allow scientists to reanalyze samples and explore new hypotheses. However, mining samples from them can be a limiting factor, since the metadata available in these repositories is often misannotated, misleading, and decentralized, creating an overly complex environment for sample reanalysis. The main goal of the HumanMetagenomeDB is to simplify the identification and use of public human metagenomes of interest. HumanMetagenomeDB version 1.0 contains metadata of 69 822 metagenomes. We standardized 203 attributes, based on standardized ontologies, describing host characteristics (e.g. sex, age and body mass index), diagnosis information (e.g. cancer, Crohn's disease and Parkinson), location (e.g. country, longitude and latitude), sampling site (e.g. gut, lung and skin) and sequencing attributes (e.g. sequencing platform, average length and sequence quality). Further, HumanMetagenomeDB version 1.0 metagenomes encompass 58 countries, 9 main sample sites (i.e. body parts), 58 diagnoses and multiple ages, ranging from just born to 91 years old. The HumanMetagenomeDB is publicly available at https://webapp.ufz.de/hmgdb/.

show abstract

The fate of sulfonamide resistance genes and anthropogenic pollution marker intI1 after discharge of wastewater into a pristine river stream

Haenelt

Wang

Kasmanas

et al. 2023

Front. Microbiol.

View full text Add to dashboard Cite

IntroductionCurrently there are sparse regulations regarding the discharge of antibiotics from wastewater treatment plants (WWTP) into river systems, making surface waters a latent reservoir for antibiotics and antibiotic resistance genes (ARGs). To better understand factors that influence the fate of ARGs in the environment and to foster surveillance of antibiotic resistance spreading in such habitats, several indicator genes have been proposed, including the integrase gene intI1 and the sulfonamide resistance genes sul1 and sul2.MethodsHere we used quantitative PCR and long-read nanopore sequencing to monitor the abundance of these indicator genes and ARGs present as class 1 integron gene cassettes in a river system from pristine source to WWTP-impacted water. ARG abundance was compared with the dynamics of the microbial communities determined via 16S rRNA gene amplicon sequencing, conventional water parameters and the concentration of sulfamethoxazole (SMX), sulfamethazine (SMZ) and sulfadiazine (SDZ).ResultsOur results show that WWTP effluent was the principal source of all three sulfonamides with highest concentrations for SMX (median 8.6 ng/l), and of the indicator genes sul1, sul2 and intI1 with median relative abundance to 16S rRNA gene of 0.55, 0.77 and 0.65%, respectively. Downstream from the WWTP, water quality improved constantly, including lower sulfonamide concentrations, decreasing abundances of sul1 and sul2 and lower numbers and diversity of ARGs in the class 1 integron. The riverine microbial community partially recovered after receiving WWTP effluent, which was consolidated by a microbiome recovery model. Surprisingly, the relative abundance of intI1 increased 3-fold over 13 km of the river stretch, suggesting an internal gene multiplication.DiscussionWe found no evidence that low amounts of sulfonamides in the aquatic environment stimulate the maintenance or even spread of corresponding ARGs. Nevertheless, class 1 integrons carrying various ARGs were still present 13 km downstream from the WWTP. Therefore, limiting the release of ARG-harboring microorganisms may be more crucial for restricting the environmental spread of antimicrobial resistance than attenuating ng/L concentrations of antibiotics.

show abstract

Machine learning-assisted identification of bioindicators predicts medium-chain carboxylate production performance of an anaerobic mixed culture

et al. 2022

View full text Add to dashboard Cite

Background The ability to quantitatively predict ecophysiological functions of microbial communities provides an important step to engineer microbiota for desired functions related to specific biochemical conversions. Here, we present the quantitative prediction of medium-chain carboxylate production in two continuous anaerobic bioreactors from 16S rRNA gene dynamics in enriched communities. Results By progressively shortening the hydraulic retention time (HRT) from 8 to 2 days with different temporal schemes in two bioreactors operated for 211 days, we achieved higher productivities and yields of the target products n-caproate and n-caprylate. The datasets generated from each bioreactor were applied independently for training and testing machine learning algorithms using 16S rRNA genes to predict n-caproate and n-caprylate productivities. Our dataset consisted of 14 and 40 samples from HRT of 8 and 2 days, respectively. Because of the size and balance of our dataset, we compared linear regression, support vector machine and random forest regression algorithms using the original and balanced datasets generated using synthetic minority oversampling. Further, we performed cross-validation to estimate model stability. The random forest regression was the best algorithm producing more consistent results with median of error rates below 8%. More than 90% accuracy in the prediction of n-caproate and n-caprylate productivities was achieved. Four inferred bioindicators belonging to the genera Olsenella, Lactobacillus, Syntrophococcus and Clostridium IV suggest their relevance to the higher carboxylate productivity at shorter HRT. The recovery of metagenome-assembled genomes of these bioindicators confirmed their genetic potential to perform key steps of medium-chain carboxylate production. Conclusions Shortening the hydraulic retention time of the continuous bioreactor systems allows to shape the communities with desired chain elongation functions. Using machine learning, we demonstrated that 16S rRNA amplicon sequencing data can be used to predict bioreactor process performance quantitatively and accurately. Characterizing and harnessing bioindicators holds promise to manage reactor microbiota towards selection of the target processes. Our mathematical framework is transferrable to other ecosystem processes and microbial systems where community dynamics is linked to key functions. The general methodology used here can be adapted to data types of other functional categories such as genes, transcripts, proteins or metabolites.

show abstract

MuDoGeR: Multi-Domain Genome Recovery from metagenomes made easy

Rocha

Kasmanas

Kallies

et al. 2022

Preprint

View full text Add to dashboard Cite

Several frameworks that recover genomes from Prokaryotes, Eukaryotes, and viruses from metagenomes exist. For those with little bioinformatics experience, it is difficult to evaluate quality, annotate genes, dereplicate, assign taxonomy and calculate relative abundance and coverage from genomes belonging to different domains. MuDoGeR is a user-friendly tool accessible for non-bioinformaticians that make genome recovery from metagenomes of Prokaryotes, Eukaryotes, and viruses alone or in combination easy. By testing MuDoGeR using 574 metagenomes and 24 genomes, we demonstrated users could run it in a few samples or high-throughput. MuDoGeR is an open-source software available at https://github.com/mdsufz/MuDoGeR.

show abstract

MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes

Nata’ala

Avila-Santos

Kasmanas

et al. 2022

Environmental Microbiome

View full text Add to dashboard Cite

Background Metagenomics is an expanding field within microbial ecology, microbiology, and related disciplines. The number of metagenomes deposited in major public repositories such as Sequence Read Archive (SRA) and Metagenomic Rapid Annotations using Subsystems Technology (MG-RAST) is rising exponentially. However, data mining and interpretation can be challenging due to mis-annotated and misleading metadata entries. In this study, we describe the Marine Metagenome Metadata Database (MarineMetagenomeDB) to help researchers identify marine metagenomes of interest for re-analysis and meta-analysis. To this end, we have manually curated the associated metadata of several thousands of microbial metagenomes currently deposited at SRA and MG-RAST. Results In total, 125 terms were curated according to 17 different classes (e.g., biome, material, oceanic zone, geographic feature and oceanographic phenomena). Other standardized features include sample attributes (e.g., salinity, depth), sample location (e.g., latitude, longitude), and sequencing features (e.g., sequencing platform, sequence count). MarineMetagenomeDB version 1.0 contains 11,449 marine metagenomes from SRA and MG-RAST distributed across all oceans and several seas. Most samples were sequenced using Illumina sequencing technology (84.33%). More than 55% of the samples were collected from the Pacific and the Atlantic Oceans. About 40% of the samples had their biomes assigned as ‘ocean’. The ‘Quick Search’ and ‘Advanced Search’ tabs allow users to use different filters to select samples of interest dynamically in the web app. The interactive map allows the visualization of samples based on their location on the world map. The web app is also equipped with a novel download tool (on both Windows and Linux operating systems), that allows easy download of raw sequence data of selected samples from their respective repositories. As a use case, we demonstrated how to use the MarineMetagenomeDB web app to select estuarine metagenomes for potential large-scale microbial biogeography studies. Conclusion The MarineMetagenomeDB is a powerful resource for non-bioinformaticians to find marine metagenome samples with curated metadata and stimulate meta-studies involving marine microbiomes. Our user-friendly web app is publicly available at https://webapp.ufz.de/marmdb/.

show abstract

OrtSuite: from genomes to prediction of microbial interactions within targeted ecosystem processes

et al. 2021

View full text Add to dashboard Cite

The high complexity found in microbial communities makes the identification of microbial interactions challenging. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content of microbial communities and targeted to specific ecosystem processes. The pipeline is composed of three user-friendly bash commands. OrtSuite combines ortholog clustering with genome annotation strategies limited to user-defined sets of functions allowing for hypothesis-driven data analysis such as assessing microbial interactions in specific ecosystems. OrtSuite matched, on average, 96% of experimentally verified KEGG orthologs involved in benzoate degradation in a known group of benzoate degraders. We evaluated the identification of putative synergistic species interactions using the sequenced genomes of an independent study that had previously proposed potential species interactions in benzoate degradation. OrtSuite is an easy-to-use workflow that allows for rapid functional annotation based on a user-curated database and can easily be extended to ecosystem processes where connections between genes and reactions are known. OrtSuite is an open-source software available at https://github.com/mdsufz/OrtSuite.

show abstract

Explainable Machine Learning for Breast Cancer Diagnosis

Brito-Sarracino

Santos

Antunes

et al. 2019

View full text Add to dashboard Cite

Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in Prokaryotic metagenome-assembled genome recovery

Rocha

Kasmanas

Toscan

et al. 2023

Preprint

View full text Add to dashboard Cite

We hypothesize that sample evenness, sequencing depth and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample evenness, sequencing depth and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, whilst the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates the scientific community should their findings from MAG recovery, especially when asserting novel species or metabolic traits.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.