In the pursuit of improved compound identification and
database
search tasks, this study explores heteronuclear single quantum coherence
(HSQC) spectra simulation and matching methodologies. HSQC spectra
serve as unique molecular fingerprints, enabling a valuable balance
of data collection time and information richness. We conducted a comprehensive
evaluation of the following four HSQC simulation techniques: ACD/Labs
(ACD), MestReNova (MNova), Gaussian NMR calculations (DFT), and a
graph-based neural network (ML). For the latter two techniques, we
developed a reconstruction logic to combine proton and carbon 1D spectra
into HSQC spectra. The methodology involved the implementation of
three peak-matching strategies (minimum-sum, Euclidean-distance, and
Hungarian distance) combined with three padding strategies (zero-padding,
peak-truncated, and nearest-neighbor double assignment). We found
that coupling these strategies with a robust simulation technique
facilitates the accurate identification of correct molecules from
similar analogues (regio- and stereoisomers) and allows for fast and
accurate large database searches. Furthermore, we demonstrated the
efficacy of the best-performing methodology by rectifying the structures
of a set of previously misidentified molecules. This research indicates
that effective HSQC spectral simulation and matching methodologies
significantly facilitate molecular structure elucidation. Furthermore,
we offer a Google Colab notebook for researchers to use our methods
on their own data ().