Machine learning-based predictive
models allow rapid
and reliable
prediction of material properties and facilitate innovative materials
design. Base oils used in the formulation of lubricant products are
complex hydrocarbons of varying sizes and structure. This study developed
Gaussian process regression-based models to accurately predict the
temperature-dependent density and dynamic viscosity of 305 complex
hydrocarbons. In our approach, strongly correlated/collinear predictors
were trimmed, important predictors were selected by least absolute
shrinkage and selection operator (LASSO) regularization and prior
domain knowledge, hyperparameters were systematically optimized by
Bayesian optimization, and the models were interpreted. The approach
provided versatile and quantitative structure–property relationship
(QSPR) models with relatively simple predictors for determining the
dynamic viscosity and density of complex hydrocarbons at any temperature.
In addition, we developed molecular dynamics simulation-based descriptors
and evaluated the feasibility and versatility of dynamic descriptors
from simulations for predicting the material properties. It was found
that the models developed using a comparably smaller pool of dynamic
descriptors performed similarly in predicting density and viscosity
to models based on many more static descriptors. The best models were
shown to predict density and dynamic viscosity with coefficient of
determination (R
2) values of 99.6% and
97.7%, respectively, for all data sets, including a test data set
of 45 molecules. Finally, partial dependency plots (PDPs), individual
conditional expectation (ICE) plots, local interpretable model-agnostic
explanation (LIME) values, and trimmed model R
2 values were used to identify the most important static and
dynamic predictors of the density and viscosity.