Water is a ubiquitous solvent in chemistry and life. It is therefore no surprise that the aqueous solubility of compounds has a key role in various domains, including but not limited to drug discovery, paint, coating, and battery materials design. Measurement and prediction of aqueous solubility is a complex and prevailing challenge in chemistry. For the latter, different data-driven prediction models have recently been developed to augment the physics-based modeling approaches. To construct accurate data-driven estimation models, it is essential that the underlying experimental calibration data used by these models is of high fidelity and quality. Existing solubility datasets show variance in the chemical space of compounds covered, measurement methods, experimental conditions, but also in the non-standard representations, size, and accessibility of data. To address this problem, we generated a new database of compounds, AqSolDB, by merging a total of nine different aqueous solubility datasets, curating the merged data, standardizing and validating the compound representation formats, marking with reliability labels, and providing 2D descriptors of compounds as a Supplementary Resource.
In recent years, artificial intelligence (AI) methods have prominently proven their use in solving complex problems. Across science and engineering disciplines, the data-driven approach has become the fourth and newest paradigm. It is the burgeoning of findable, accessible, interoperable, and reusable (FAIR) data generated by the first three paradigms of experiment, theory, and simulation that has enabled the application of AI methods for the scientific discovery and engineering of compounds and materials. Here, we introduce a recipe for a data-driven strategy to speed up the virtual screening of two-dimensional (2D) materials and to accelerate the discovery of new candidates with targeted physical and chemical properties. As a proof of concept, we generate new 2D candidate materials covering an extremely large compositional space, downselect 316,505 likely stable 2D materials, and predict the key physical properties of these new 2D candidates. Finally, we hone in on the most propitious candidates of functional 2D materials for energy conversion and storage.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.