In this paper, we propose a new definition of models applicability domain (AD) based on the selection of sufficient portion of individual QSPR models to be accepted for property prediction. Efficiency of this approach has been demonstrated in ensemble modeling of the stability constants logK of the 1:1 complexes of 17 lanthanide and transition metal ions (M) with various organic ligands (L) ) metal ions with sets of diverse organic molecules in aqueous solution at 298 K and an ionic strength 0.1 M. The models have been validated by external 5-fold crossvalidation procedure. The root mean squared error (RMSE) of predictions is similar to systematic errors in experimental data. This is twice smaller compared to earlier reported models for which "quorum control" AD has not been applied.
Methods
Data SetsThe experimental stability constant (logK) ) metal ions with diverse organic ligands in water were selected from the IUPAC Stability Constants Database (SC DB) (version 5.33, Academic Software) [15] at standard temperature 298 K and an ionic strength I = 0.1 M. Some logK values (around 15 %) were corrected to specified temperature and an ionic strength using the procedures included in SC DB.2D structures of ligands, names of metal ions and corresponding logK values resulted from searching in SC DB were converted into Structure -Data Files (SDF) served as an input in the MLR module of the ISIDA (In Silico Design and Data Analysis) /QSPR program.[53] The data manager EdiSDF [35,46,54] was used to prepare data sets containing finally from 52 (Hg