The Protein Data Bank (PDB) is the single global archive of experimentally determined three-dimensional (3D) structure data of biological macromolecules. Since 2003, the PDB has been managed by the Worldwide Protein Data Bank (wwPDB; wwpdb.org), an international consortium that collaboratively oversees deposition, validation, biocuration, and open access dissemination of 3D macromolecular structure data. The PDB Core Archive houses 3D atomic coordinates of more than 144 000 structural models of proteins, DNA/RNA, and their complexes with metals and small molecules and related experimental data and metadata. Structure and experimental data/metadata are also stored in the PDB Core Archive using the readily extensible wwPDB PDBx/mmCIF master data format, which will continue to evolve as data/metadata from new experimental techniques and structure determination methods are incorporated by the wwPDB. Impacts of the recently developed universal wwPDB OneDep deposition/validation/biocuration system and various methods-specific wwPDB Validation Task Forces on improving the quality of structures and data housed in the PDB Core Archive are described together with current challenges and future plans.
Advances in computation have been enabling many recent advances in biomolecular applications of NMR. Due to the wide diversity of applications of NMR, the number and variety of software packages for processing and analyzing NMR data is quite large, with labs relying on dozens, if not hundreds of software packages. Discovery, acquisition, installation, and maintenance of all these packages is a burdensome task. Because the majority of software packages originate in academic labs, persistence of the software is compromised when developers graduate, funding ceases, or investigators turn to other projects. To simplify access to and use of biomolecular NMR software, foster persistence, and enhance reproducibility of computational workflows, we have developed NMRbox, a shared resource for NMR software and computation. NMRbox employs virtualization to provide a comprehensive software environment preconfigured with hundreds of software packages, available as a downloadable virtual machine or as a Platform-as-a-Service supported by a dedicated compute cloud. Ongoing development includes a metadata harvester to regularize, annotate, and preserve workflows and facilitate and enhance data depositions to BioMagResBank, and tools for Bayesian inference to enhance the robustness and extensibility of computational analyses. In addition to facilitating use and preservation of the rich and dynamic software environment for biomolecular NMR, NMRbox fosters the development and deployment of a new class of metasoftware packages. NMRbox is freely available to not-for-profit users.
We report dramatic sensitivity enhancements in multidimensional MAS NMR spectra by the use of nonuniform sampling (NUS) and introduce Maximum Entropy Interpolation (MINT) processing that assures the linearity between the time- and frequency domains of the NUS acquired datasets. A systematic analysis of sensitivity and resolution in 2D and 3D NUS spectra reveals that with NUS at least one-and-a-half to two-fold sensitivity enhancement can be attained in each indirect dimension without compromising the spectral resolution. These enhancements are similar to or higher than those attained by the newest-generation commercial cryogenic probes. We explore the benefits of this NUS/MaxEnt approach in proteins and protein assemblies using 1-73-(U-13C,15N)/74-108-(U-15N) E. coli thioredoxin reassembly. We demonstrate that in thioredoxin reassembly, NUS permits acquisition of high-quality 3D-NCACX spectra, which are inaccessible with conventional sampling due to prohibitively long experiment times. Of critical importance, issues which hinder NUS-based SNR enhancement in 3D-NMR of liquids are mitigated in the study of solid samples where theoretical enhancements on the order of 3-4 fold are accessible by compounding the NUS-based SNR enhancement of each indirect dimension. NUS/MINT is anticipated to be widely applicable and advantageous for multidimensional heteronuclear MAS NMR spectroscopy of proteins, protein assemblies, and other biological systems.
The arrival of very high field magnets and cryogenic circuitries, and the development of relaxation-optimized pulse sequences have added powerful tools for increasing sensitivity and resolution in NMR studies of biomacromolecules. The potential of these advances is not fully realized in practice, however, since current experimental protocols do not permit sufficient data sampling for optimal resolution in the indirect dimensions. Here we analyze quantitatively how increasing resolution in indirect dimensions affects the S/N ratio and compare this with currently used sampling routines. Optimal resolution would require sampling up to approximately 3 R (2)(-1), and the S/N reaches a maximum at approximately 1.2 R (2)(-1). Currently used data acquisition protocols rarely sample beyond 0.4 R (2)(-1), and extending evolution times would result in prohibitively long experiments. We show that a general solution to this problem is to use non-uniform sampling, where only a small subset of data points in the indirect sampling space are measured, and possibly different numbers of transients are collected for different evolution times. Coupled with modern methods of spectrum analysis, this strategy delivers substantially improved resolution and/or reduced measuring times compared to uniform sampling, without compromising sensitivity. Higher resolution in the indirect dimensions will facilitate the use of automated assignment programs.
NMR spectroscopy is an inherently insensitive technique, and many challenging applications such as biomolecular studies operate at the very limits of sensitivity and resolution. Advances in superconducting magnet, cryogenic probe, and pulse sequence technologies have resulted in dramatic improvements in both sensitivity and resolution in the past decade. Conversely, the signal-processing method used most widely in NMR spectroscopy, extrapolation of the time domain signal by linear prediction (LP) followed by discrete Fourier transformation (DFT), was developed in the early 1980s and has not been subjected to detailed scrutiny for its impact on sensitivity and resolution. Here we report the first systematic investigation of the accuracy and precision of spectra obtained by LP extrapolation followed by DFT. We compare the results to spectra obtained by maximum-entropy (MaxEnt) reconstruction, which was developed contemporaneously to LP extrapolation but is not widely employed in NMR spectroscopy. Although it reduces truncation artifacts and increases the amplitudes of strong peaks, we find that LP extrapolation generates false-positive peaks and introduces frequency errors. These defects of LP extrapolation become less pronounced for longer data records and higher signal-to-noise ratio. MaxEnt generally yields more detectable peaks for a given number of data samples, more accurate peak frequencies, and fewer false-positive peaks than LP extrapolation. MaxEnt also permits the use of nonlinear sampling, which can give dramatic improvements in resolution. These results show that the use of MaxEnt together with nonlinear sampling, rather than LP extrapolation, could reduce the amount of instrument time required for adequate sensitivity and resolution by a factor of 2 or more.
Iterative thresholding algorithms have a long history of application to signal processing. Although they are intuitive and easy to implement, their development was heuristic and mainly ad hoc. Using a special form of the thresholding operation, called soft thresholding, we show that the fixed point of iterative thresholding is equivalent to minimum l 1 -norm reconstruction. We illustrate the method for spectrum analysis of a time series. This result helps to explain the success of these methods and illuminates connections with maximum entropy and minimum area methods, while also showing that there are more efficient routes to the same result. The power of the l 1 -norm and related functionals as regularizers of solutions to underdetermined systems will likely find numerous useful applications in NMR.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.