Metalloproteins are proteins capable of binding one or more metal ions, which may be required for their biological function, or for regulation of their activities or for structural purposes. Genome sequencing projects have provided a huge number of protein primary sequences, but, even though several different elaborate analyses and annotations have been enabled by a rich and ever-increasing portfolio of bioinformatic tools, metal-binding properties remain difficult to predict as well as to investigate experimentally. Consequently, the present knowledge about metalloproteins is only partial. The present bioinformatic research proposes a strategy to answer the question of how many and which proteins encoded in the human genome may require zinc for their physiological function. This is achieved by a combination of approaches, which include: (i) searching in the proteome for the zinc-binding patterns that, on their turn, are obtained from all available X-ray data; (ii) using libraries of metal-binding protein domains based on multiple sequence alignments of known metalloproteins obtained from the Pfam database; and (iii) mining the annotations of human gene sequences, which are based on any type of information available. It is found that 1684 proteins in the human proteome are independently identified by all three approaches as zinc-proteins, 746 are identified by two, and 777 are identified by only one method. By assuming that all proteins identified by at least two approaches are truly zinc-binding and inspecting the proteins identified by a single method, it can be proposed that ca. 2800 human proteins are potentially zinc-binding in vivo, corresponding to 10% of the human proteome, with an uncertainty of 400 sequences. Available functional information suggests that the large majority of human zinc-binding proteins are involved in the regulation of gene expression. The most abundant class of zinc-binding proteins in humans is that of zinc-fingers, with Cys4 and Cys2His2 being the most common types of coordination environment.
We analysed the roles and distribution of metal ions in enzymatic catalysis using available public databases and our new resource Metal-MACiE (http://www.ebi.ac.uk/thornton-srv/databases/Metal_MACiE/home.html). In Metal-MACiE, a database of metal-based reaction mechanisms, 116 entries covering 21% of the metal-dependent enzymes and 70% of the types of enzyme-catalysed chemical transformations are annotated according to metal function. We used Metal-MACiE to assess the functions performed by metals in biological catalysis and the relative frequencies of different metals in different roles, which can be related to their individual chemical properties and availability in the environment. The overall picture emerging from the overview of Metal-MACiE is that redox-inert metal ions are used in enzymes to stabilize negative charges and to activate substrates by virtue of their Lewis acid properties, whereas redox-active metal ions can be used both as Lewis acids and as redox centres. Magnesium and zinc are by far the most common ions of the first type, while calcium is relatively less used. Magnesium, however, is most often bound to phosphate groups of substrates and interacts with the enzyme only transiently, whereas the other metals are stably bound to the enzyme. The most common metal of the second type is iron, which is prevalent in the catalysis of redox reactions, followed by manganese, cobalt, molybdenum, copper and nickel. The control of the reactivity of redox-active metal ions may involve their association with organic cofactors to form stable units. This occurs sometimes for iron and nickel, and quite often for cobalt and molybdenum.
Zinc is one of the metal ions essential for life, as it is required for the proper functioning of a large number of proteins. Despite its importance, the annotation of zinc-binding proteins in gene banks or protein domain databases still has significant room for improvement. In the present work, we compiled a list of known zinc-binding protein domains and of known zinc-binding sequence motifs (zinc-binding patterns), and then used them jointly to analyze the proteome of 57 different organisms to obtain an overview of zinc usage by archaeal, bacterial, and eukaryotic organisms. Zinc-binding proteins are an abundant fraction of these proteomes, ranging between 4% and 10%. The number of zinc-binding proteins correlates linearly with the total number of proteins encoded by the genome of an organism, but the proportionality constant of Eukaryota (8.8%) is significantly higher than that observed in Bacteria and Archaea (from 5% to 6%). Most of this enrichment is due to the larger portfolio of regulatory proteins in Eukaryota.
Genome-wide studies are providing researchers with a potentially complete list of the molecular components present in living systems. It is now evident that several metal ions are essential to life and that metalloproteins, that is, proteins that require a metal ion to perform their physiological function, are widespread in all organisms. However, there is currently a lack of well-established experimental methods aimed at analyzing the complete set of metalloproteins encoded by an organism (the metalloproteome). This information is essential for a comprehensive understanding of the whole of the processes occurring in living systems. Predictive tools must thus be applied to define metalloproteomes. In this Account, we discuss the current progress in the development of bioinformatics methods for the prediction, based solely on protein sequences, of metalloproteins. With these methods, it is possible to scan entire proteomes for metalloproteins, such as zinc proteins or copper proteins, which are identified by the presence of specific metal-binding sites, metal-binding domains, or both. The predicted metalloproteins can be then analyzed to obtain information on their function and evolution. For example, the comparative analysis of the content and usage of different metalloproteins across living organisms can be used to obtain hints on the evolution of metalloproteomes. As case studies, we predicted the content of zinc, nonheme iron, and copper-proteins in a representative set of organisms taken from the three domains of life. The zinc proteome represents about 9% of the entire proteome in eukaryotes, but it ranges from 5% to 6% in prokaryotes, therefore indicating a substantial increase of the number of zinc proteins in higher organisms. In contrast, the number of nonheme iron proteins is relatively constant in eukaryotes and prokaryotes, and therefore their relative share diminishes in passing from archaea (about 7%), to bacteria (about 4%), to eukaryotes (about 1%). Copper proteins represent less than 1% of the proteomes in all the organisms studied. We also discuss the limits of these methods, the approaches used to overcome some of these limits to improve our predictions, and possible future developments in the field of bioinformatics-based investigation of metalloproteins. As a long-standing goal of the biological sciences, the understanding of life at the systems level, or systems biology, is experiencing a rekindling of interest; ready access to complete information on metalloproteomes is crucial to correctly represent the role of metal ions in living organisms.
Zinc is indispensable to all forms of life as it is an essential component of many different proteins involved in a wide range of biological processes. Not differently from other metals, zinc in proteins can play different roles that depend on the features of the metal-binding site. In this work, we describe zinc sites in proteins with known structure by means of three-dimensional templates that can be automatically extracted from PDB files and consist of the protein structure around the metal, including the zinc ligands and the residues in close spatial proximity to the ligands. This definition is devised to intrinsically capture the features of the local protein environment that can affect metal function, and corresponds to what we call a minimal functional site (MFS). We used MFSs to classify all zinc sites whose structures are available in the PDB and combined this classification with functional annotation as available in the literature. We classified 77% of zinc sites into ten clusters, each grouping zinc sites with structures that are highly similar, and an additional 16% into seven pseudo-clusters, each grouping zinc sites with structures that are only broadly similar. Sites where zinc plays a structural role are predominant in eight clusters and in two pseudo-clusters, while sites where zinc plays a catalytic role are predominant in two clusters and in five pseudo-clusters. We also analyzed the amino acid composition of the coordination sphere of zinc as a function of its role in the protein, highlighting trends and exceptions. In a period when the number of known zinc proteins is expected to grow further with the increasing awareness of the cellular mechanisms of zinc homeostasis, this classification represents a valuable basis for structure-function studies of zinc proteins, with broad applications in biochemistry, molecular pharmacology and de novo protein design.
In high-throughput genome-level protein investigation efforts, such as Structural Genomics, the systematic experimental characterization of metal-binding properties (i.e., the investigation of the metalloproteome) is not always pursued and remains far from trivial. In the present work, we have applied a bioinformatic approach to investigate the occurrence of (putative) copper-binding proteins in 57 different organisms spanning the entire tree of life. We found that the size of the copper proteome is generally less than 1% of the total proteome of an organism, in both eukaryotes and prokaryotes. The occurrence of copper-binding proteins is relatively scarce when compared to that of zinc-binding proteins and of non-heme iron proteins. This may be due to both poorer bioavailability (in particular with respect to iron in the ancient world) and the complexity of copper chemistry and the risks associated with it, which may have adversely affected natural selection of copper-binding proteins. The present analysis shows that there is a strong relationship between the metal coordination sphere and protein function. A network involving proteins having roles in both copper transport and respiration was identified, parts or all of which are detected in the majority of the organisms examined.
We present here MetalPDB (freely accessible at http://metalweb.cerm.unifi.it), a novel resource aimed at conveying the information available on the three-dimensional (3D) structures of metal-binding biological macromolecules in a consistent and effective manner. This is achieved through the systematic and automated representation of metal-binding sites in proteins and nucleic acids by way of Minimal Functional Sites (MFSs). MFSs are 3D templates that describe the local environment around the metal(s) independently of the larger context of the macromolecular structure embedding the site(s), and are the central objects of MetalPDB design. MFSs are grouped into equistructural (broadly defined as sites found in corresponding positions in similar structures) and equivalent sites (equistructural sites that contain the same metals), allowing users to easily analyse similarities and variations in metal–macromolecule interactions, and to link them to functional information. The web interface of MetalPDB allows access to a comprehensive overview of metal-containing biological structures, providing a basis to investigate the basic principles governing the properties of these systems. MetalPDB is updated monthly in an automated manner.
MetalPDB (http://metalweb.cerm.unifi.it/) is a database providing information on metal-binding sites detected in the three-dimensional (3D) structures of biological macromolecules. MetalPDB represents such sites as 3D templates, called Minimal Functional Sites (MFSs), which describe the local environment around the metal(s) independently of the larger context of the macromolecular structure. The 2018 update of MetalPDB includes new contents and tools. A major extension is the inclusion of proteins whose structures do not contain metal ions although their sequences potentially contain a known MFS. In addition, MetalPDB now provides extensive statistical analyses addressing several aspects of general metal usage within the PDB, across protein families and in catalysis. Users can also query MetalPDB to extract statistical information on structural aspects associated with individual metals, such as preferred coordination geometries or aminoacidic environment. A further major improvement is the functional annotation of MFSs; the annotation is manually performed via a password-protected annotator interface. At present, ∼50% of all MFSs have such a functional annotation. Other noteworthy improvements are bulk query functionality, through the upload of a list of PDB identifiers, and ftp access to MetalPDB contents, allowing users to carry out in-depth analyses on their own computational infrastructure.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.