The discovery and development of catalysts and catalytic processes are essential components to maintaining an ecological balance in the future. Recent revolutions made in data science could have a great impact on traditional catalysis research in both industry and academia and could accelerate the development of catalysts. Machine learning (ML), a subfield of data science, can play a central role in this paradigm shift away from the use of traditional approaches. In this review, we present a user’s guide for ML that we believe will be helpful for scientists performing research in the field of catalysis and summarize recent progress that has been made in utilizing ML to create homogeneous and heterogeneous catalysts. The focus of the review is on the design, synthesis, and characterization of catalytic materials/compounds as well as their applications to catalyzed processes. The ML technique not only enhances ways to discover catalysts but also serves as a powerful tool to establish a deeper understanding of relationships between the properties of materials/compounds and their catalytic activities, selectivities, and stabilities. This knowledge facilitates the establishment of principles employed to design catalysts and to enhance their efficiencies. Despite such advantages of ML, it is noteworthly that the current ML-assisted development of real catalysts remains in its infancy, mainly because of the complexity of catalysis associated with the fact that catalysis is a time-dependent dynamic event. In this review, we discuss how seamless integration of experiment, theory, and data science can be used to accelerate catalyst development and to guide future studies aimed at applications that will impact society’s need to produce energy, materials, and chemicals. Moreover, the limitations and difficulties of ML in catalysis research originating from the complex nature of catalysis are discussed in order to make the catalysis community aware of challenges that need to be addressed for effective and practical use of ML in the field.
Computationally predicting drug-target interactions is useful to select possible drug (or target) candidates for further biochemical verification. We focus on machine learning-based approaches, particularly similarity-based methods that use drug and target similarities, which show relationships among drugs and those among targets, respectively. These two similarities represent two emerging concepts, the chemical space and the genomic space. Typically, the methods combine these two types of similarities to generate models for predicting new drug-target interactions. This process is also closely related to a lot of work in pharmacogenomics or chemical biology that attempt to understand the relationships between the chemical and genomic spaces. This background makes the similarity-based approaches attractive and promising. This article reviews the similarity-based machine learning methods for predicting drug-target interactions, which are state-of-the-art and have aroused great interest in bioinformatics. We describe each of these methods briefly, and empirically compare these methods under a uniform experimental setting to explore their advantages and limitations.
The process employed to discover new materials for specific applications typically utilizes screening of large compound libraries. In this approach, the performance of a compound is correlated to the properties of elements referred to as descriptors. In the effort described below, we developed a simple and efficient machine learning (ML) model for predicting adsorption energies of CH 4 related species, namely, CH 3 , CH 2 , CH, C, and H on the Cubased alloys. The developed ML model predicted the DFT-calculated adsorption energies with 12 descriptors, which are readily available values for the selected elements. The predictive accuracy of four regression methods (ordinary linear regression by least-squares (OLR), random forest regression (RFR), gradient boosting regression (GBR), and extra tree regression (ETR)) with different numbers of descriptors and different test-set/training-set ratios was quantitatively evaluated using statistical cross validations. Among four types of regression methods, we have found that ETR gave the best performance in predicting the adsorption energies with the average root mean squared errors (RMSEs) below 0.3 eV. Strikingly, despite its simplicity and low computational cost, this model can predict the adsorption energies on a range of Cu-based alloy models (46 in total number) as calculated by using DFT. In addition, we show the ML prediction for the differences in the adsorption energies of CH 3 and CH 2 on the same surface. This would be of great importance especially when designing the selective catalytic reaction processes to suppress the undesired overreactions. The accuracy and simplicity of the developed system suggest that adsorption energies can be readily predicted without time-consuming DFT calculations, and eventually, this would allow us to predict the catalytic performances of the solid catalysts.
The surface oxygen vacancy formation energy (E Ovac ) is an important parameter in determining the catalytic activity of metal oxides. Estimating these energies can therefore lead to data-driven design of promising catalyst candidates. In the present study, we determine E Ovac for various insulating and semiconducting oxides. Statistical investigations indicate that the band gap, bulk formation energy, and electron affinity are factors that strongly influence E Ovac . Electrons enter defect states after O desorption, and these states can be in the valence band, mid-gap, or in the conduction band. Subsequent adsorption of O 2 , NO, CO, CO 2 , and H 2 molecules on an O-deficient surface is also investigated. These molecules become preferentially adsorbed at the defect sites, and E Ovac is identified as the dominant factor that determines the adsorption mode as well as a descriptor that shows good correlation with the adsorption energy.
Recent studies have revealed that newly emerging transformed cells are often eliminated from epithelial tissues via cell competition with the surrounding normal epithelial cells. This cancer preventive phenomenon is termed epithelial defense against cancer (EDAC). However, it remains largely unknown whether and how EDAC is diminished during carcinogenesis. In this study, using a cell competition mouse model, we show that high-fat diet (HFD) feeding substantially attenuates the frequency of apical elimination of RasV12-transformed cells from intestinal and pancreatic epithelia. This process involves both lipid metabolism and chronic inflammation. Furthermore, aspirin treatment significantly facilitates eradication of transformed cells from the epithelial tissues in HFD-fed mice. Thus, our work demonstrates that obesity can profoundly influence competitive interaction between normal and transformed cells, providing insights into cell competition and cancer preventive medicine.
The d-band center for metals has been widely used in order to understand activity trends in metal-surface-catalyzed reactions in terms of the linear Brønsted-Evans-Polanyi relation and Hammer-Nørskov d-band model. In this paper, the d-band centers for eleven metals (Fe, Co, Ni, Cu, Ru, Rh, Pd, Ag, Ir, Pt, Au) and their pairwise bimetals for two different structures (1% metal doped-or overlayer-covered metal surfaces) are statistically predicted using machine learning methods from readily available values as descriptors for the target metals (such as the density and the enthalpy of fusion of each metal). The predictive accuracy of four regression methods with different numbers of descriptors and different test-set/training-set ratios are quantitatively evaluated using statistical cross validations. It is shown that the d-band centers are reasonably well predicted by the gradient boosting regression (GBR) method with only six descriptors, even when we predict 75% of the data from only 25% given for training (average 2 root mean square error (RMSE) < 0.5 eV). This demonstrates a potential use of machine learning methods for predicting the activity trends of metal surfaces with a negligible CPU time compared to first-principles methods.
Regulation of transcription elongation by RNA polymerase II (Pol II) is a key regulatory step in gene transcription. Recently, the little elongation complex (LEC)—which contains the transcription elongation factor ELL/EAF—was found to be required for the transcription of Pol II-dependent small nuclear RNA (snRNA) genes. Here, we show that the human Mediator subunit MED26 plays a role in the recruitment of LEC to a subset of snRNA genes through direct interaction of EAF and the N-terminal domain (NTD) of MED26. Loss of MED26 in cells decreases the occupancy of LEC at a subset of snRNA genes and results in a reduction in their transcription. Our results suggest that the MED26 NTD functions as a molecular switch in the exchange of TBP-associated factor 7 (TAF7) for LEC in order to facilitate the transition from initiation to elongation during transcription of a subset of snRNA genes.
The literature provides insights for catalyst design and discovery. Effective analysis of reported data using machine learning (ML) methods offers the ability to gain valuable information. However, utilizing the literature in this way has obstacles such as lack of compositional overlaps, bias from prior published data, and low sample counts for many elements. The present study describes an ML approach that considers elemental features as input representations instead of inputting catalyst compositions directly. This ML method has the potential for catalyst discovery, including catalytic reactions with limited catalyst composition overlap in the available data. Oxidative coupling of methane (OCM), water gas shift (WGS), and CO oxidation reactions were chosen to confirm the effectiveness of the proposed method by analysis using several state‐of‐the‐art ML methods. Among the ML methods tested, gradient boosting regression with XGBoost (XGB) provided the best results, and prediction accuracy was improved by the proposed approach for all three reaction types. In addition, a quantitative value of “feature importance score” was calculated to evaluate the most influential input variables on catalyst performance. Finally, catalyst optimization was explored using ML as a “surrogate” model, and the top 20 promising candidate catalysts were identified for the OCM reaction based on the optimization. The advantages of ML in catalysis analysis as well as the difficulties and limitations originating from the complexity of heterogeneous catalysis were explored.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.