“…So far, publicly available data was mainly used for the development and evaluation of brain penetration models. − ,, Despite some valuable efforts toward data set aggregation and standardization, brain penetration data sets still remain heterogeneous and comparatively small for ML approaches, ranging from few hundreds to few thousands of molecules. − ,,, The most recently published data set “B3DB” constitutes the largest publicly available in vivo brain penetration data set and, to the best of our knowledge, it has not been used for modeling yet. Perhaps, due to the limited data set size, previous studies have mainly reported on model performance on random compound subsets, ,− , which is an indicator of self-consistency but not of future model predictivity. , For a more realistic estimation of model prospective performance, evaluation should be done on new chemical series or scaffolds (series or scaffold split) , or on the most recent experiments (temporal split). The latter resembles the model use in pharmaceutical research and requires temporal or date information, which is typically only available in proprietary data sets. , …”