End-point free-energy methods as an indispensable component
in
virtual screening are commonly recognized as a tool with a certain
level of screening power in pharmaceutical research. While a huge
number of records could be found for end-point applications in protein–ligand,
protein–protein, and protein–DNA complexes from academic
and industrial reports, up to now, there is no large-scale benchmark
in host–guest complexes supporting the screening power of end-point
free-energy techniques. A good benchmark requires a data set of sufficient
coverage of pharmaceutically relevant chemical space, a long-time
sampling length supporting the trajectory approximation of the ensemble
average, and a sufficient sample size of receptor–acceptor
pairs to stabilize the performance statistics. In this work, selecting
a popular family of macrocyclic hosts named cucurbiturils, we construct
a large data set containing 154 host–guest pairs, perform extensive
end-point sampling of several hundred nanosecond lengths for each
system, and extract the free-energy estimates with a variety of end-point
free-energy techniques, including the advanced three-trajectory dielectric-constant-variable
regime proposed in our recent work. The best-performing end-point
protocol employs GAFF2 for solute descriptions, the three-trajectory
end-point sampling regime, and the MM/GBSA Hamiltonian in free-energy
extraction, achieving a high ranking metrics of Kendall τ >
0.6, a Pearlman predictive index of ∼0.8, and a high scoring
power of Pearson r > 0.8. The current project
as
the first large-scale systematic benchmark of end-point methods in
host–guest complexes in academic publications provides solid
evidence of the applicability of end-point techniques and direct guidance
of computational setups in practical host–guest systems.