Machine learning (ML) methods have been present in the field of NMR since decades, but it has experienced a tremendous growth in the last few years, especially thanks to the emergence of deep learning (DL) techniques taking advantage of the increased amounts of data and available computer power. These algorithms are successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large data sets and have been intensively applied in different areas of NMR including metabonomics, clinical diagnosis, or relaxometry. In this article, we concentrate on the various applications of ML/DL in the areas of NMR signal processing and analysis of small molecules, including automatic structure verification and prediction of NMR observables in solution.
K E Y W O R D S
Quantitative (1)H NMR (qNMR) is a widely applied technique for compound concentration and purity determinations. The NMR spectrum will display signals from all species in the sample, and this is generally a strength of the method. The key spectral determination is the full and accurate determination of one or more signal areas. Accurate peak integration can be an issue when unrelated peaks resonate in an important integral region. We describe a "hybrid" approach to signal integration that provides an accurate estimation of signal area, removing the component(s) that may arise from unrelated peaks. This is achieved by using the most accurate integration method for the region and removing unwanted contributions. The key to this performing well, and in almost all cases, is the use of areas from deconvolved peaks. We describe this process and show that it can be very successfully applied to cases where the highest precision is required and for more common cases of NMR-based quantitation.
A novel data-evaluation procedure for the automatic atom to peak or multiplet assignment of 1H-NMR spectra of small molecules has been developed using a fast and robust expert system. The applicability and reliability of the method are demonstrated by comparison of a manually assigned database of 1H-NMR spectra with the assignments produced by the automatic procedure. The results of this analysis show an excellent success ratio, indicating that this new algorithm can have a major impact as a time saving tool for the organic chemist. A new graphical feature used to illustrate both the stability and quality of the elementary assignments is also introduced.
NMR is routinely used to quantitate chemical species. The necessary experimental procedures to acquire quantitative data are well-known, but relatively little attention has been applied to data processing and analysis. We describe here a robust expert system that can be used to automatically choose the best signals in a sample for overall concentration determination and determine analyte concentration using all accepted methods. The algorithm is based on the complete deconvolution of the spectrum which makes it tolerant of cases where signals are very close to one another and includes robust methods for the automatic classification of NMR resonances and molecule-to-spectrum multiplets assignments. With the functionality in place and optimized, it is then a relatively simple matter to apply the same workflow to data in a fully automatic way. The procedure is desirable for both its inherent performance and applicability to NMR data acquired for very large sample sets.
NMR binding assays are routinely applied in hit finding and validation during early stages of drug discovery, particularly for fragment-based lead generation. To this end, compound libraries are screened by ligand-observed NMR experiments such as STD, T1ρ, and CPMG to identify molecules interacting with a target. The analysis of a high number of complex spectra is performed largely manually and therefore represents a limiting step in hit generation campaigns. Here we report a novel integrated computational procedure that processes and analyzes ligand-observed proton and fluorine NMR binding data in a fully automated fashion. A performance evaluation comparing automated and manual analysis results on (19)F- and (1)H-detected data sets shows that the program delivers robust, high-confidence hit lists in a fraction of the time needed for manual analysis and greatly facilitates visual inspection of the associated NMR spectra. These features enable considerably higher throughput, the assessment of larger libraries, and shorter turn-around times.
A strong case exists for the introduction of burst non-uniform sampling (NUS) in the direct dimension of NMR spectroscopy experiments. The resulting gaps in the NMR free induction decay can reduce the power demands of long experiments (by switching off broadband decoupling for example) and/or be used to introduce additional pulses (to refocus homonuclear coupling, for example). The final EXtended ACquisition Time (EXACT) spectra are accessed by algorithmic reconstruction of the missing data points and can provide higher resolution in the direct dimension than is achievable with existing non-NUS methods.
A user-friendly NMR interface for the visual and accurate determination of experimental one-bond proton-carbon coupling constants (J) in small molecules is presented. This intuitive J profile correlates directly to δ(H), and J facilitates the rapid identification and assignment of H signals belonging to key structural elements and functional groups. Illustrative examples are provided for some target molecules, including terminal alkynes, strained rings, electronegative substituents, or lone-pair-bearing heteronuclei.
There is an increasing focus on the
part of academic institutions,
funding agencies, and publishers, if not researchers themselves, on
preservation and sharing of research data. Motivations for sharing
include research integrity, replicability, and reuse. One of the barriers
to publishing data is the extra work involved in preparing data for
publication once a journal article and its supporting information
have been completed. In this work, a method is described to generate
both human and machine-readable supporting information directly from
the primary instrumental data files and to generate the metadata to
ensure it is published in accordance with findable, accessible, interoperable,
and reusable (FAIR) guidelines. Using this approach, both the human
readable supporting information and the primary (raw) data can be
submitted simultaneously with little extra effort. Although traditionally
the data package would be sent to a journal publisher for publication
alongside the article, the data package could also be published independently
in an institutional FAIR data repository. Workflows are described
that store the data packages and generate metadata appropriate for
such a repository. The methods both to generate and to publish the
data packages have been implemented for NMR data, but the concept
is extensible to other types of spectroscopic data as well.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.