The ever-growing collection of metagenomic samples available in public data repositories has the potential to reveal new details on the emergence and dissemination of mobilized colistin resistance genes. Our analysis of metagenomes deposited online in the last 10 years shows that the environmental distribution of mcr gene variants depends on sampling source and location, possibly leading to the emergence of new variants, although the contig on which the mcr genes were found remained consistent.
Motivation Solubility and expression levels of proteins can be a limiting factor for large-scale studies and industrial production. By determining the solubility and expression directly from the protein sequence, the success rate of wet-lab experiments can be increased. Results In this study, we focus on predicting the solubility and usability for purification of proteins expressed in Escherichia coli directly from the sequence. Our model NetSolP is based on deep learning protein language models called transformers and we show that it achieves state-of-the-art performance and improves extrapolation across datasets. As we find current methods are built on biased datasets, we curate existing datasets by using strict sequence-identity partitioning and ensure that there is minimal bias in the sequences. Availability The predictor and data are available at https://services.healthtech.dtu.dk/service.php?NetSolP and the open-sourced code is available at https://github.com/tvinet/NetSolP-1.0 Supplementary information Supplementary data is attached in submission.
The growing threat of antimicrobial resistance (AMR) calls for new epidemiological surveillance methods, as well as a deeper understanding of how antimicrobial resistance genes (ARGs) have been transmitted around the world. The large pool of sequencing data available in public repositories provides an excellent resource for monitoring the temporal and spatial dissemination of AMR in different ecological settings. However, only a limited number of research groups globally have the computational resources to analyze such data. We retrieved 442 Tbp of sequencing reads from 214,095 metagenomic samples from the European Nucleotide Archive (ENA) and aligned them using a uniform approach against ARGs and 16S/18S rRNA genes. Here, we present the results of this extensive computational analysis and share the counts of reads aligned. Over 6.76∙108 read fragments were assigned to ARGs and 3.21∙109 to rRNA genes, where we observed distinct differences in both the abundance of ARGs and the link between microbiome and resistome compositions across various sampling types. This collection is another step towards establishing global surveillance of AMR and can serve as a resource for further research into the environmental spread and dynamic changes of ARGs.
A crucial process in the production of industrial enzymes is recombinant gene expression, which aims to induce enzyme overexpression of the genes in a host microbe. Current approaches for securing overexpression rely on molecular tools such as adjusting the recombinant expression vector, adjusting cultivation conditions, or performing codon optimizations. However, such strategies are time-consuming, and an alternative strategy would be to select genes for better compatibility with the recombinant host. Several methods for predicting expressibility and solubility are available; however, they are all optimized for the expression host Escherichia coli. We show that these tools are not suited for predicting expression potential in the industrially important host Bacillus subtilis. Instead, we build a B. subtilis-specific machine learning model for expressibility prediction. Given millions of unlabelled proteins, and a small labelled dataset, we can successfully train such a predictive model. The unlabelled proteins provide a performance boost relative to using amino acid frequencies of the labelled proteins as input. On average, we obtain a modest performance of 0.64 area-under-the-curve (AUC) and 0.2 Matthews correlation coeffcient (MCC). However, we find that this is sufficient to be useful for prioritization of expression candidates. Moreover, the predicted class probabilities are correlated with expression levels. A number of features related to protein expression, including base frequencies and solubility, are captured by the model.
Since the initial discovery of a mobilized colistin resistance gene (mcr-1), several other variants have been reported, some of which might have circulated a while before being discovered. Metagenomic data provides an opportunity to re-analyze available older data to understand the evolutionary history of recently discovered antimicrobial resistance genes (ARGs). Here, we present a large-scale metagenomic study of 442 Tbp of sequencing reads from 214,095 samples to identify the host and geographical distribution and genomic context of nine mcr gene variants (mcr-1 to mcr-9). Our results show that the dissemination of each variant is not uniform. Instead, the source and location play a role in the spread. Despite the very diverse distribution, the genomic background of the mcr genes remains unchanged as the same mobile genetic elements and plasmid replicons occur. This work emphasizes the importance of sharing genomic data for surveillance of ARGs in our fight against antimicrobial resistance.
The rapid spread of antimicrobial resistance (AMR) is a threat to global health, and the nature of co-occurring antimicrobial resistance genes (ARGs) may cause collateral AMR effects once antimicrobial agents are used. Therefore, it is essential to identify which pairs of ARGs co-occur. Given the wealth of NGS data available in public repositories, we have investigated the correlation between ARG abundances in a collection of 214,095 metagenomic datasets. Using more than 6.76·108read fragments aligned to ARGs to infer pairwise correlation coefficients, we found that more ARGs correlated with each other in human and animal sampling origins than in soil and water environments. Furthermore, we showed that the correlations serve as risk profiles of resistance co-occurring to critically important antimicrobials. Using these profiles, we found several key ARGs indirectly but strongly selecting for ARGs of critical importance, such as tetracycline ARGs correlating with most forms of resistances. In conclusion, this study highlights the important ARG players indirectly involved in shaping the resistomes of various environments that can serve as monitoring targets in AMR surveillance programs.
Since the initial discovery of a mobilized colistin resistance gene (mcr-1), several other variants have been reported, some of which might have circulated a while before being discovered. Metagenomic data provides an opportunity to re-analyze available older data to understand the evolutionary history of recently discovered antimicrobial resistance genes (ARGs). Here, we present a large-scale metagenomic study of 442 Tbp of sequencing reads from 214,095 samples to identify the host and geographical distribution and genomic context of nine mcr gene variants (mcr-1 to mcr-9). Our results show that the dissemination of each variant is not uniform. Instead, the source and location play a role in the spread. Despite the very diverse distribution, the genomic background of the mcr genes remains unchanged as the same mobile genetic elements and plasmid replicons occur. This work emphasizes the importance of sharing genomic data for surveillance of ARGs in our fight against antimicrobial resistance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.