Molecular dynamics (MD) is a core methodology of molecular
modeling
and computational design for the study of the dynamics and temporal
evolution of molecular systems. MD simulations have particularly benefited
from the rapid increase of computational power that has characterized
the past decades of computational chemical research, being the first
method to be successfully migrated to the GPU infrastructure. While
new-generation MD software is capable of delivering simulations on
an ever-increasing scale, relatively less effort is invested in developing
postprocessing methods that can keep up with the quickly expanding
volumes of data that are being generated. Here, we introduce a new
idea for sampling frames from large MD trajectories, based on the
recently introduced framework of extended similarity indices. Our
approach presents a new, linearly scaling alternative to the traditional
approach of applying a clustering algorithm that usually scales as
a quadratic function of the number of frames. When showcasing its
usage on case studies with different system sizes and simulation lengths,
we have registered speedups of up to 2 orders of magnitude, as compared
to traditional clustering algorithms. The conformational diversity
of the selected frames is also noticeably higher, which is a further
advantage for certain applications, such as the selection of structural
ensembles for ligand docking. The method is available open-source
at
.