Knowing the catalytic turnover numbers of enzymes is essential for understanding the growth rate, proteome composition, and physiology of organisms, but experimental data on enzyme turnover numbers is sparse and noisy. Here, we demonstrate that machine learning can successfully predict catalytic turnover numbers in Escherichia coli based on integrated data on enzyme biochemistry, protein structure, and network context. We identify a diverse set of features that are consistently predictive for both in vivo and in vitro enzyme turnover rates, revealing novel protein structural correlates of catalytic turnover. We use our predictions to parameterize two mechanistic genome-scale modelling frameworks for proteome-limited metabolism, leading to significantly higher accuracy in the prediction of quantitative proteome data than previous approaches. The presented machine learning models thus provide a valuable tool for understanding metabolism and the proteome at the genome scale, and elucidate structural, biochemical, and network properties that underlie enzyme kinetics.
Microbial communities inhabit spatial architectures that divide a global environment into isolated or semi-isolated local environments, which leads to the partitioning of a microbial community into a collection of local communities. Despite its ubiquity and great interest in related processes, how and to what extent spatial partitioning affects the structures and dynamics of microbial communities is poorly understood. Using modeling and quantitative experiments with simple and complex microbial communities, we demonstrate that spatial partitioning modulates the community dynamics by altering the local interaction types and global interaction strength. Partitioning promotes the persistence of populations with negative interactions but suppresses those with positive interactions. For a community consisting of populations with both positive and negative interactions, an intermediate level of partitioning maximizes the overall diversity of the community. Our results reveal a general mechanism underlying the maintenance of microbial diversity and have implications for natural and engineered communities.
The number and diversity of biomedical datasets grew rapidly in the last decade. A large number of datasets are stored in various repositories, with different formats. Existing dataset retrieval systems lack the capability of cross-repository search. As a result, users spend time searching datasets in known repositories, and they typically do not find new repositories. The biomedical and healthcare data discovery index ecosystem (bioCADDIE) team organized a challenge to solicit new indexing and searching strategies for retrieving biomedical datasets across repositories. We describe the work of one team that built a retrieval pipeline and examined its performance. The pipeline used online resources to supplement dataset metadata, automatically generated queries from users’ free-text questions, produced high-quality retrieval results and achieved the highest inferred Normalized Discounted Cumulative Gain among competitors. The results showed that it is a promising solution for cross-database, cross-domain and cross-repository biomedical dataset retrieval.
Database URL: https://github.com/w2wei/dataset_retrieval_pipeline
Plasmids are a major type of mobile genetic elements (MGEs) that mediate horizontal gene transfer. The stable maintenance of plasmids plays a critical role in the functions and survival for microbial populations. However, predicting and controlling plasmid persistence and abundance in complex microbial communities remain challenging. Computationally, this challenge arises from the combinatorial explosion associated with the conventional modeling framework. Recently, a plasmid-centric framework (PCF) has been developed to overcome this computational bottleneck. This framework enables the derivation of a simple metric, the persistence potential, to predict plasmid persistence and abundance. Here, we discuss how PCF can be extended to account for plasmid interactions. We also discuss how such model-guided predictions of plasmid fates can benefit from the development of new experimental tools and data-driven computational methods.
Dynamical systems often generate distinct outputs according to different initial conditions, and one can infer the corresponding input configuration given an output. This property captures the essence of information encoding and decoding. Here, we demonstrate the use of self-organized patterns, combined with machine learning, to achieve distributed information encoding and decoding. Our approach exploits a critical property of many natural pattern-formation systems: in repeated realizations, each initial configuration generates similar but not identical output patterns due to randomness in the patterning process. However, for sufficiently small randomness, different groups of patterns that arise from different initial configurations can be distinguished from one another. Modulating the pattern generation and machine learning model training can tune the tradeoff between encoding capacity and security. We further show that this strategy is applicable to non-biological dynamical systems and scalable by implementing the encoding and decoding of all characters of the standard English keyboard.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.