Thirty-eight percent
of protein structures in the Protein Data
Bank contain at least one metal ion. However, not all these metal
sites are biologically relevant. Cations present as impurities during
sample preparation or in the crystallization buffer can cause the
formation of protein–metal complexes that do not exist in vivo.
We implemented a deep learning approach to build a classifier able
to distinguish between physiological and adventitious zinc-binding
sites in the 3D structures of metalloproteins. We trained the classifier
using manually annotated sites extracted from the MetalPDB database.
Using a 10-fold cross validation procedure, the classifier achieved
an accuracy of about 90%. The same neural classifier could predict
the physiological relevance of non-heme mononuclear iron sites with
an accuracy of nearly 80%, suggesting that the rules learned on zinc
sites have general relevance. By quantifying the relative importance
of the features describing the input zinc sites from the network perspective
and by analyzing the characteristics of the MetalPDB datasets, we
inferred some common principles. Physiological sites present a low
solvent accessibility of the aminoacids forming coordination bonds
with the metal ion (the metal ligands), a relatively large number
of residues in the metal environment (≥20), and a distinct
pattern of conservation of Cys and His residues in the site. Adventitious
sites, on the other hand, tend to have a low number of donor atoms
from the polypeptide chain (often one or two). These observations
support the evaluation of the physiological relevance of novel metal-binding
sites in protein structures.
Nuclear magnetic
resonance (NMR) is an effective, commonly used
experimental approach to screen small organic molecules against a
protein target. A very popular method consists of monitoring the changes
of the NMR chemical shifts of the protein nuclei upon addition of
the small molecule to the free protein. Multidimensional NMR experiments
allow the interacting residues to be mapped along the protein sequence.
A significant amount of human effort goes into manually tracking the
chemical shift variations, especially when many signals exhibit chemical
shift changes and when many ligands are tested. Some computational
approaches to automate the procedure are available, but none of them
as a web server. Furthermore, some methods require the adoption of
a fairly specific experimental setup, such as recording a series of
spectra at increasing small molecule:protein ratios. In this work,
we developed a tool requesting a minimal amount of experimental data
from the user, implemented it as an open-source program, and made
it available as a web application. Our tool compares two spectra,
one of the free protein and one of the small molecule:protein mixture,
based on the corresponding peak lists. The performance of the tool
in terms of correct identification of the protein-binding regions
has been evaluated on different protein targets, using experimental
data from interaction studies already available in the literature.
For a total of 16 systems, our tool achieved between 79% and 100%
correct assignments, properly identifying the protein regions involved
in the interaction.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.