Millions of distinct metal-organic frameworks (MOFs) can be made by combining metal nodes and organic linkers. At present, over 90,000 MOFs have been synthesized and over 500,000 predicted. This raises the question whether a new experimental or predicted structure adds new information. For MOF chemists, the chemical design space is a combination of pore geometry, metal nodes, organic linkers, and functional groups, but at present we do not have a formalism to quantify optimal coverage of chemical design space. In this work, we develop a machine learning method to quantify similarities of MOFs to analyse their chemical diversity. This diversity analysis identifies biases in the databases, and we show that such bias can lead to incorrect conclusions. The developed formalism in this study provides a simple and practical guideline to see whether new structures will have the potential for new insights, or constitute a relatively small variation of existing structures.
By combining metal nodes with organic linkers we can potentially synthesize millions of possible metal–organic frameworks (MOFs). The fact that we have so many materials opens many exciting avenues but also create new challenges. We simply have too many materials to be processed using conventional, brute force, methods. In this review, we show that having so many materials allows us to use big-data methods as a powerful technique to study these materials and to discover complex correlations. The first part of the review gives an introduction to the principles of big-data science. We show how to select appropriate training sets, survey approaches that are used to represent these materials in feature space, and review different learning architectures, as well as evaluation and interpretation strategies. In the second part, we review how the different approaches of machine learning have been applied to porous materials. In particular, we discuss applications in the field of gas storage and separation, the stability of these materials, their electronic properties, and their synthesis. Given the increasing interest of the scientific community in machine learning, we expect this list to rapidly expand in the coming years.
Developing algorithmic approaches for the rational design and discovery of materials can enable us to systematically find novel materials, which can have huge technological and social impact. However, such rational design requires a holistic perspective over the full multistage design process, which involves exploring immense materials spaces, their properties, and process design and engineering as well as a techno-economic assessment. The complexity of exploring all of these options using conventional scientific approaches seems intractable. Instead, novel tools from the field of machine learning can potentially solve some of our challenges on the way to rational materials design. Here we review some of the chief advancements of these methods and their applications in rational materials design, followed by a discussion on some of the main challenges and opportunities we currently face together with our perspective on the future of rational materials design and discovery.
The design rules for materials are clear for applications with a single objective. For most applications, however, there are often multiple, sometimes competing objectives where there is no single best material and the design rules change to finding the set of Pareto optimal materials. In this work, we leverage an active learning algorithm that directly uses the Pareto dominance relation to compute the set of Pareto optimal materials with desirable accuracy. We apply our algorithm to de novo polymer design with a prohibitively large search space. Using molecular simulations, we compute key descriptors for dispersant applications and drastically reduce the number of materials that need to be evaluated to reconstruct the Pareto front with a desired confidence. This work showcases how simulation and machine learning techniques can be coupled to discover materials within a design space that would be intractable using conventional screening approaches.
Metal-organic frameworks (MOFs) are highly versatile materials owing to their vast structural and chemical tunability. These hybrid inorganic-organic crystalline materials offer an ideal platform to incorporate light-harvesting and catalytic centers and thus, exhibit a great potential to be exploited in solar-driven photocatalytic processes such as H 2 production and CO 2 reduction. To be photocatalytically active, UV-visible optical absorption and appropriate band alignment with respect to the target redox potential is required. Despite fulfilling these criteria, the photocatalytic performance of MOFs is still limited by their ability to produce long-lived electron-hole pairs and long-range charge transport. Here, a computational strategy is presented to address these two descriptors in MOFs and to translate them into charge transfer numbers and effective mass values. The approach is applied to 15 MOFs from the literature that encompass the main strategies used in the design of efficient photocatalysts including different metals, ligands, and topologies. The results capture the main characteristics previously reported for these MOFs and enable to identify promising candidates. In the quest of novel photocatalytic systems, high-throughput screening based on charge separation and charge mobility features are envisioned to be applied in large databases of both experimentally and in silico generated MOFs.
Large amounts of data are generated in chemistry labs-nearly all instruments record data in a digital form, yet a considerable proportion is also captured non-digitally and reported in ways non-accessible to both humans and their computational agents. Chemical research is still largely centred around paper-based lab notebooks, and the publication of data is often more an afterthought than an integral part of the process. Here we argue that a modular open-science platform for chemistry would be beneficial not only for data-mining studies but also, well beyond that, for the entire chemistry community. Much progress has been made over the past few years in developing technologies such as electronic lab notebooks that aim to address data-management concerns. This will help make chemical data reusable, however it is only one step. We highlight the importance of centring open-science initiatives around open, machine-actionable data and emphasize that most of the required technologies already exist-we only need to connect, polish and embrace them.
Knowledge of the oxidation state of a metal centre in a material is essential to understand its properties. Chemists have developed several theories to predict the oxidation state on the basis of the chemical formula. These methods are quite successful for simple compounds but often fail to describe the oxidation states of more complex systems, such as metal-organic frameworks. In this work, we present a datadriven approach to automatically assign oxidation states, using a machine learning algorithm trained on the assignments by chemists encoded in the chemical names in the Cambridge Crystallographic Database. Our approach only considers the immediate local chemical environment around a metal centre and, in this way, is robust to most of the experimental uncertainties in these structures (like incorrect protonation or unbound solvents). We find such excellent accuracy (> 98 %) in our predictions that we can use our method to identify a large number of incorrect assignments in the database. The predictions of our model follow chemical intuition, without explicitly having taught the model those heuristics. This work nicely illustrates how powerful the collective knowledge of chemists actually is. Machine learning can harvest this knowledge and convert it into a useful tool for chemists. 1 MainOxidation states are a concept every chemist learns, at the latest, in their first days as undergraduates. Their history goes back to the early days of chemistry when Lavoisier coined the word oxidation and W öhler the expression "oxydationsstufe" (old German spelling for the term oxidation number) 1;2 . Oxidation states are central to balance redox reactions 3 , for chemical nomenclature 4 , and above all to help chemists to systematise and reason about (redox) reactivity as well as spectroscopic properties [5][6][7] . The concept of oxidation states plays such an important role in the fundamentals of chemistry that some have argued that the oxidation numbers should be represented as the third dimension of the periodic table 8 .Every chemist also experienced that assigning oxidation states is not trivial. The International Union of Pure and Applied Chemistry (IUPAC) defines oxidation states as ". . . the charge of this atom after ionic approximation of its heteronuclear bonds . . . " 9;10 . This definition is, however, too generic and cannot be readily translated into a recipe to determine the oxidation state of any given compound. Therefore, in practice, chemists fall back to formal electron counting rules. For molecules, this approach gives satisfactory results for most cases. For crystalline materials, however, these electron counting rules often fail as they are based on bonds and bond orders, which are ill-defined for crystalline materials 11 . Therefore, for crystalline materials the oxidation state is often estimated using the bond valence sum method 12 . This method, which dates back to Linus Pauling 13 , approximates all bonds as fully ionic, and the oxidation state is estimated by summing up all bond valence sums, w...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.