The value of fine and specialty chemicals is often determined by the specific requirements in their physical and chemical properties. Therefore, it is most desirable to design the structure of chemicals to meet some targeted material properties. In the past, the design of specialty chemicals has been based largely on experience and trial-and-error. However, recent advances in computational chemistry and machine learning can offer a new path to this problem. In this presentation, we demonstrate a successful example where the structure of a chemical of specified value of octanol−water partition coefficient (K ow ) can be predicted by computers. This method consists of two parts, the first being a robust method, the COSMO-SAC activity coefficient model, that predicts the activity coefficient with input of only the molecular structure. The second component of this method is a derivative-free optimization algorithm that searches in the multidimensional structure space for the desired value of K ow . In particular, the genetic algorithm (GA), based on the Darwinian theory of evolution and natural selection, combined with simulated annealing (SA) is adopted for this purpose. Compared to other optimization algorithms, GA can overcome the problem of being trapped in local minima and SA can help improve the convergence. Therefore, the GA−SA combination has been found to be very suitable for molecular design. We show that the value of K ow can be achieved within 1% of the target in 30 generations with a proper set of evolution parameters (including the size of the population, the probability of selection, the rate of temperature annealing, etc.). The same method can be applied to the search for chemicals with other desired properties, such as vapor pressure and solubility.
A new
molecular data structure and molecular structure operation
algorithms are proposed for general purpose molecular design. The
data structure allows for a variety of molecular operations for creating
new molecules. Two types of molecular operations were developed, unimolecular
and bimolecular operations. In unimolecular operations, a child molecule
can be created from a parent via addition of a functional group, deletion
of a fragment, mutation of an atom, etc. In bimolecular operations,
children molecules are generated from two parent molecules through
combination or crossover (hybridization). These molecular operations
are essential for the creation and modification of molecules for the
purpose of molecular design. The data structure is capable of representing
linear, branched, multifunctional, and multivalent compounds. Algorithms
are developed for deriving the molecular data structure of a molecule
from its atomic coordinates and vice versa. We show that this new
molecular data structure and the developed algorithms, referred to
as Molecular Assembling and Representation Suite, allow one to generate
a comprehensive library of new molecules via performing every possible
molecular structure modification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.