Natural products represent a rich reservoir of small molecule drug candidates utilized as antimicrobial drugs, anticancer therapies, and immunomodulatory agents. These molecules are microbial secondary metabolites synthesized by co-localized genes termed Biosynthetic Gene Clusters (BGCs). The increase in full microbial genomes and similar resources has led to development of BGC prediction algorithms, although their precision and ability to identify novel BGC classes could be improved. Here we present a deep learning strategy (DeepBGC) that offers reduced false positive rates in BGC identification and an improved ability to extrapolate and identify novel BGC classes compared to existing machine-learning tools. We supplemented this with random forest classifiers that accurately predicted BGC product classes and potential chemical activity. Application of DeepBGC to bacterial genomes uncovered previously undetectable putative BGCs that may code for natural products with novel biologic activities. The improved accuracy and classification ability of DeepBGC represents a major addition to in-silico BGC identification.
No abstract
Rotational grazing management strategies have been promoted as a way to improve the sustainability of native grass-based pasture systems. From disturbance ecology theory, rotational grazing relative to continuous grazing can increase pasture productivity by allowing vegetation to recover after short intense grazing periods. This project sought to assess whether soil organic carbon (SOC) stocks would also increase with adoption of rotational grazing management. Twelve pairs of rotationally and continuously grazed paddocks were sampled across a rainfall gradient in South Australia. Pasture productivity approximated as the normalized difference vegetation index (NDVI) was on average no different between management categories, but when the data from all sites were aggregated as log response ratios (rotational/continuous) a significant positive trend of increasing NDVI under rotational grazing relative to continuous grazing was found (R2 = 0.52). Mean SOC stocks (0–30 cm) were 48.3 Mg C ha-1 with a range of 20–80 Mg C ha-1 across the study area with no differences between grazing management categories. SOC stocks were well correlated with rainfall and temperature (multiple linear regression R2 = 0.61). After removing the influence of climate on SOC stocks, the management variables, rest periods, stocking rate and grazing days, were found to be significantly correlated with SOC, explaining 22% of the variance in SOC, but there were still no clear differences in SOC stocks at paired sites. We suggest three reasons for the lack of SOC response. First, changes in plant productivity and turnover in low-medium rainfall regions due to changes in grazing management are small and slow, so we would only expect at best small incremental changes in SOC stocks. This is compounded by the inherent variability within and between paddocks making detection of a small real change difficult on short timescales. Lastly, the management data suggests that there is a gradation in implementation of rotational grazing and the use of two fixed categories (i.e. rotational v. continuous) may not be the most appropriate method of comparing diverse management styles.
In this work we propose a novel, sound framework for evolutionary feature selection in unsupervised machine learning problems. We show that unsupervised feature selection is inherently multi-objective and behaves differently from supervised feature selection in that the number of features must be maximized instead of being minimized. Although this might sound surprising from a supervised learning point of view, we exemplify this relationship on the problem of data clustering and show that existing approaches do not pose the optimization problem in an appropriate way. Another important consequence of this paradigm change is a method which segments the Pareto sets produced by our approach. Inspecting only prototypical points from these segments drastically reduces the amount of work for selecting a final solution. We compare our methods against existing approaches on eight data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.