Ichigaku Takigawa scite author profile

The discovery and development of catalysts and catalytic processes are essential components to maintaining an ecological balance in the future. Recent revolutions made in data science could have a great impact on traditional catalysis research in both industry and academia and could accelerate the development of catalysts. Machine learning (ML), a subfield of data science, can play a central role in this paradigm shift away from the use of traditional approaches. In this review, we present a user’s guide for ML that we believe will be helpful for scientists performing research in the field of catalysis and summarize recent progress that has been made in utilizing ML to create homogeneous and heterogeneous catalysts. The focus of the review is on the design, synthesis, and characterization of catalytic materials/compounds as well as their applications to catalyzed processes. The ML technique not only enhances ways to discover catalysts but also serves as a powerful tool to establish a deeper understanding of relationships between the properties of materials/compounds and their catalytic activities, selectivities, and stabilities. This knowledge facilitates the establishment of principles employed to design catalysts and to enhance their efficiencies. Despite such advantages of ML, it is noteworthly that the current ML-assisted development of real catalysts remains in its infancy, mainly because of the complexity of catalysis associated with the fact that catalysis is a time-dependent dynamic event. In this review, we discuss how seamless integration of experiment, theory, and data science can be used to accelerate catalyst development and to guide future studies aimed at applications that will impact society’s need to produce energy, materials, and chemicals. Moreover, the limitations and difficulties of ML in catalysis research originating from the complex nature of catalysis are discussed in order to make the catalysis community aware of challenges that need to be addressed for effective and practical use of ML in the field.

show abstract

Similarity-based machine learning methods for predicting drug–target interactions: a brief review

Ding¹,

Takigawa²,

Mamitsuka³

et al. 2013

349

266

View full text Add to dashboard Cite

Computationally predicting drug-target interactions is useful to select possible drug (or target) candidates for further biochemical verification. We focus on machine learning-based approaches, particularly similarity-based methods that use drug and target similarities, which show relationships among drugs and those among targets, respectively. These two similarities represent two emerging concepts, the chemical space and the genomic space. Typically, the methods combine these two types of similarities to generate models for predicting new drug-target interactions. This process is also closely related to a lot of work in pharmacogenomics or chemical biology that attempt to understand the relationships between the chemical and genomic spaces. This background makes the similarity-based approaches attractive and promising. This article reviews the similarity-based machine learning methods for predicting drug-target interactions, which are state-of-the-art and have aroused great interest in bioinformatics. We describe each of these methods briefly, and empirically compare these methods under a uniform experimental setting to explore their advantages and limitations.

show abstract

Toward Effective Utilization of Methane: Machine Learning Prediction of Adsorption Energies on Metal Alloys

et al. 2018

View full text Add to dashboard Cite

The process employed to discover new materials for specific applications typically utilizes screening of large compound libraries. In this approach, the performance of a compound is correlated to the properties of elements referred to as descriptors. In the effort described below, we developed a simple and efficient machine learning (ML) model for predicting adsorption energies of CH 4 related species, namely, CH 3 , CH 2 , CH, C, and H on the Cubased alloys. The developed ML model predicted the DFT-calculated adsorption energies with 12 descriptors, which are readily available values for the selected elements. The predictive accuracy of four regression methods (ordinary linear regression by least-squares (OLR), random forest regression (RFR), gradient boosting regression (GBR), and extra tree regression (ETR)) with different numbers of descriptors and different test-set/training-set ratios was quantitatively evaluated using statistical cross validations. Among four types of regression methods, we have found that ETR gave the best performance in predicting the adsorption energies with the average root mean squared errors (RMSEs) below 0.3 eV. Strikingly, despite its simplicity and low computational cost, this model can predict the adsorption energies on a range of Cu-based alloy models (46 in total number) as calculated by using DFT. In addition, we show the ML prediction for the differences in the adsorption energies of CH 3 and CH 2 on the same surface. This would be of great importance especially when designing the selective catalytic reaction processes to suppress the undesired overreactions. The accuracy and simplicity of the developed system suggest that adsorption energies can be readily predicted without time-consuming DFT calculations, and eventually, this would allow us to predict the catalytic performances of the solid catalysts.

show abstract

Density Functional Theory Calculations of Oxygen Vacancy Formation and Subsequent Molecular Adsorption on Oxide Surfaces

et al. 2018

View full text Add to dashboard Cite

The surface oxygen vacancy formation energy (E Ovac ) is an important parameter in determining the catalytic activity of metal oxides. Estimating these energies can therefore lead to data-driven design of promising catalyst candidates. In the present study, we determine E Ovac for various insulating and semiconducting oxides. Statistical investigations indicate that the band gap, bulk formation energy, and electron affinity are factors that strongly influence E Ovac . Electrons enter defect states after O desorption, and these states can be in the valence band, mid-gap, or in the conduction band. Subsequent adsorption of O 2 , NO, CO, CO 2 , and H 2 molecules on an O-deficient surface is also investigated. These molecules become preferentially adsorbed at the defect sites, and E Ovac is identified as the dominant factor that determines the adsorption mode as well as a descriptor that shows good correlation with the adsorption energy.

show abstract

Obesity Suppresses Cell-Competition-Mediated Apical Elimination of RasV12-Transformed Cells from Epithelial Tissues

et al. 2018

View full text Add to dashboard Cite

Recent studies have revealed that newly emerging transformed cells are often eliminated from epithelial tissues via cell competition with the surrounding normal epithelial cells. This cancer preventive phenomenon is termed epithelial defense against cancer (EDAC). However, it remains largely unknown whether and how EDAC is diminished during carcinogenesis. In this study, using a cell competition mouse model, we show that high-fat diet (HFD) feeding substantially attenuates the frequency of apical elimination of RasV12-transformed cells from intestinal and pancreatic epithelia. This process involves both lipid metabolism and chronic inflammation. Furthermore, aspirin treatment significantly facilitates eradication of transformed cells from the epithelial tissues in HFD-fed mice. Thus, our work demonstrates that obesity can profoundly influence competitive interaction between normal and transformed cells, providing insights into cell competition and cancer preventive medicine.

show abstract

Machine-learning prediction of the d-band center for metals and bimetals

et al. 2016

View full text Add to dashboard Cite

The d-band center for metals has been widely used in order to understand activity trends in metal-surface-catalyzed reactions in terms of the linear Brønsted-Evans-Polanyi relation and Hammer-Nørskov d-band model. In this paper, the d-band centers for eleven metals (Fe, Co, Ni, Cu, Ru, Rh, Pd, Ag, Ir, Pt, Au) and their pairwise bimetals for two different structures (1% metal doped-or overlayer-covered metal surfaces) are statistically predicted using machine learning methods from readily available values as descriptors for the target metals (such as the density and the enthalpy of fusion of each metal). The predictive accuracy of four regression methods with different numbers of descriptors and different test-set/training-set ratios are quantitatively evaluated using statistical cross validations. It is shown that the d-band centers are reasonably well predicted by the gradient boosting regression (GBR) method with only six descriptors, even when we predict 75% of the data from only 25% given for training (average 2 root mean square error (RMSE) < 0.5 eV). This demonstrates a potential use of machine learning methods for predicting the activity trends of metal surfaces with a negligible CPU time compared to first-principles methods.

show abstract

MED26 regulates the transcription of snRNA genes through the recruitment of little elongation complex

et al. 2015

View full text Add to dashboard Cite

Regulation of transcription elongation by RNA polymerase II (Pol II) is a key regulatory step in gene transcription. Recently, the little elongation complex (LEC)—which contains the transcription elongation factor ELL/EAF—was found to be required for the transcription of Pol II-dependent small nuclear RNA (snRNA) genes. Here, we show that the human Mediator subunit MED26 plays a role in the recruitment of LEC to a subset of snRNA genes through direct interaction of EAF and the N-terminal domain (NTD) of MED26. Loss of MED26 in cells decreases the occupancy of LEC at a subset of snRNA genes and results in a reduction in their transcription. Our results suggest that the MED26 NTD functions as a molecular switch in the exchange of TBP-associated factor 7 (TAF7) for LEC in order to facilitate the transition from initiation to elongation during transcription of a subset of snRNA genes.

show abstract

Statistical Analysis and Discovery of Heterogeneous Catalysts Based on Machine Learning from Diverse Published Data

et al. 2019

View full text Add to dashboard Cite

The literature provides insights for catalyst design and discovery. Effective analysis of reported data using machine learning (ML) methods offers the ability to gain valuable information. However, utilizing the literature in this way has obstacles such as lack of compositional overlaps, bias from prior published data, and low sample counts for many elements. The present study describes an ML approach that considers elemental features as input representations instead of inputting catalyst compositions directly. This ML method has the potential for catalyst discovery, including catalytic reactions with limited catalyst composition overlap in the available data. Oxidative coupling of methane (OCM), water gas shift (WGS), and CO oxidation reactions were chosen to confirm the effectiveness of the proposed method by analysis using several state‐of‐the‐art ML methods. Among the ML methods tested, gradient boosting regression with XGBoost (XGB) provided the best results, and prediction accuracy was improved by the proposed approach for all three reaction types. In addition, a quantitative value of “feature importance score” was calculated to evaluate the most influential input variables on catalyst performance. Finally, catalyst optimization was explored using ML as a “surrogate” model, and the top 20 promising candidate catalysts were identified for the OCM reaction based on the optimization. The advantages of ML in catalysis analysis as well as the difficulties and limitations originating from the complexity of heterogeneous catalysis were explored.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.