Ionic liquids (IL) are remarkable green solvents, which find applications in many areas of nano- and biotechnology including extraction and purification of value-added compounds or fine chemicals. These liquid salts...
An essential aspect for adequate
predictions of chemical properties
by machine learning models is the database used for training them.
However, studies that analyze how the content and structure of the
databases used for training impact the prediction quality are scarce.
In this work, we analyze and quantify the relationships learned by
a machine learning model (Neural Network) trained on five different
reference databases (QM9, PC9, ANI-1E, ANI-1, and ANI-1x) to predict
tautomerization energies from molecules in Tautobase. For this, characteristics
such as the number of heavy atoms in a molecule, number of atoms of
a given element, bond composition, or initial geometry on the quality
of the predictions are considered. The results indicate that training
on a chemically diverse database is crucial for obtaining good results
and also that conformational sampling can partly compensate for limited
coverage of chemical diversity. The overall best-performing reference
database (ANI-1x) performs on average by 1 kcal/mol better than PC9,
which, however, contains about 2 orders of magnitude fewer reference
structures. On the other hand, PC9 is chemically more diverse by a
factor of ∼5 as quantified by the number of atom-in-molecule-based
fragments (amons) it contains compared with the ANI family of databases.
A quantitative measure for deficiencies is the Kullback–Leibler
divergence between reference and target distributions. It is explicitly
demonstrated that when certain types of bonds need to be covered in
the target database (Tautobase) but are undersampled in the reference
databases, the resulting predictions are poor. Examples of this include
the poor performance of all databases analyzed to predict C(sp2)–C(sp2) double bonds close to heteroatoms
and azoles containing N–N and N–O bonds. Analysis of
the results with a Tree MAP algorithm provides deeper understanding
of specific deficiencies in predicting tautomerization energies by
the reference datasets due to inadequate coverage of chemical space.
Capitalizing on this information can be used to either improve existing
databases or generate new databases of sufficient diversity for a
range of machine learning (ML) applications in chemistry.
The spectroscopy and structural dynamics of a deep eutectic
mixture
(KSCN/acetamide) with varying water content is investigated from 2D
IR (with the C–N stretch vibration of the SCN– anions as the reporter) and THz spectroscopy. Molecular dynamics
simulations correctly describe the nontrivial dependence of both spectroscopic
signatures depending on water content. For the 2D IR spectra, the
MD simulations relate the steep increase in the cross-relaxation rate
at high water content to the parallel alignment of packed SCN– anions. Conversely, the nonlinear increase of the
THz absorption with increasing water content is mainly attributed
to the formation of larger water clusters. The results demonstrate
that a combination of structure-sensitive spectroscopies and molecular
dynamics simulations provides molecular-level insights into the emergence
of heterogeneity of such mixtures by modulating their composition.
Artificial Neural Networks (NN) are already heavily involved in methods and applications for frequent tasks in the field of computational chemistry such as representation of potential energy surfaces (PES) and...
<div>
<div>
<div>
<p>Ionic liquids (IL) are remarkable green solvents, which find applications in many areas of nano- and
biotechnology including extraction and purification of value-added compounds or fine chemicals. These
liquid salts possess versatile solvation properties that can be tuned by modifications in the cation or anion
structure. So far, in contrast to the great success of theoretical and computational methodologies applied
to other fields, only a few IL models have been able to bring insights towards the rational design of such
solvents. In this work, we develop coarse-grained (CG) models for imidazolium-based ILs using a new
version of the Martini force field. The model is able to reproduce the main structural properties of pure ILs,
including spatial heterogeneity and global densities over a wide range of temperatures. More importantly,
given the high intermolecular compatibility of the Martini force field, this new IL CG model opens the
possibility of large-scale simulations of liquid-liquid extraction experiments. As examples, we show two
applications, namely the extraction of aromatic molecules from a petroleum oil model and the extraction of
omega-3 polyunsaturated fatty acids from a fish oil model. In semi-quantitative agreement with the
experiments, we show how the extraction capacity and selectivity of the IL could be affected by the cation
chain length or addition of co-solvents.
</p>
</div>
</div>
</div>
Vibrational spectroscopy in supersonic jet expansions is a powerful tool to assess molecular aggregates in close to ideal conditions for the benchmarking of quantum chemical approaches. The low temperatures achieved...
The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably...
Full-dimensional potential energy surfaces (PESs) based on machine learning (ML) techniques provide a means for accurate and efficient molecular simulations in the gas and condensed phase for various experimental observables ranging from spectroscopy to reaction dynamics. Here, the MLpot extension with PhysNet as the ML-based model for a PES is introduced into the newly developed pyCHARMM application programming interface. To illustrate the conception, validation, refining, and use of a typical workflow, para-chloro-phenol is considered as an example. The main focus is on how to approach a concrete problem from a practical perspective and applications to spectroscopic observables and the free energy for the –OH torsion in solution are discussed in detail. For the computed IR spectra in the fingerprint region, the computations for para-chloro-phenol in water are in good qualitative agreement with experiment carried out in CCl4. Moreover, relative intensities are largely consistent with experimental findings. The barrier for rotation of the –OH group increases from ∼3.5 kcal/mol in the gas phase to ∼4.1 kcal/mol from simulations in water due to favorable H-bonding interactions of the –OH group with surrounding water molecules.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.