The present work describes convenient and interpretable models by step-wise multiple linear regression (SW-MLR) and genetic algorithm-multiple linear regression (GA-MLR). These quantitative structure-retention relationship (QSRR) based strategies have been successfully used to predict the retention indices (RIs) of a series of natural compounds found in the essential oil of Pistacia lentiscus L. The dataset was divided into training set (51 compounds) and test set (25 compounds), randomly. The prediction capabilities of both approaches have been appraised by their exertion to test set compounds. The fitness statistics of the models were assessed to be satisfactory which resulted in accurate predictions. The best optimized models for SW-MLR and GA-MLR consisted 5 (X0, Qtot, Dv, HATS3e and MATS4e) and 6 (MATS2e, DELS, ATS4e, PW5, PCD and W) molecular descriptors, respectively. Our investigation revealed the superiority of SW-MLR (R 2 =0.96, Q 2 LOO =0.95 and Q 2 LGO =0.94 for training set; REP=3.7 for test set) standpoint against the best GA-MLR model (R 2 =0.955, Q 2 LOO =0.937 and Q 2 LGO =0.934 for training set; REP=7.5 for test set) for estimation of RIs of similar or unknown compounds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.