Prediction of protein–ligand
binding affinities is a central
issue in structure-based computer-aided drug design. In recent years,
much effort has been devoted to the prediction of the binding affinity
in protein–ligand complexes using machine learning (ML). Due
to the remarkable ability of ML methods in nonlinear fitting, ML-based
scoring functions (SFs) can deliver much improved performance on a
selected test set, such as the comparative assessment of scoring functions
(CASF), when compared to the classical SFs. However, the performance
of ML-based SFs heavily relies on the overall similarity of the training
set and the test set. To improve the performance and transferability
of an SF, we have tried to combine various features including energy
terms from X-score and AutoDock Vina, the properties of ligands, and
the statistical sequence-related information from either the binding
site or the full protein. In conjunction with extreme trees (ET),
an ML model, we have developed XLPFE, a new SF. Compared with other
tested methods such as X-score, AutoDock Vina, ΔvinaXGB, PSH-ML,
or CNN-score, XLPFE achieves consistently better scoring and ranking
power for various types of protein–ligand complex structures
beyond the CASF, suggesting that XLPFE has superior transferability.
In particular, XLPFE performs better with metalloenzymes. With its
faster speed, improved accuracy, and better transferability, XLPFE
could be usefully applied to a diverse range of protein–ligand
complexes.