This work discusses bioinformatics and experimental approaches to explore the human proteome, a constellation of proteins expressed in different tissues and organs. As the human proteome is not a static entity, it seems necessary to estimate the number of different protein species (proteoforms) and measure the number of copies of the same protein in a specific tissue. Here, meta-analysis of neXtProt knowledge base is proposed for theoretical prediction of the number of different proteoforms that arise from alternative splicing (AS), single amino acid polymorphisms (SAPs), and posttranslational modifications (PTMs). Three possible cases are considered: (1) PTMs and SAPs appear exclusively in the canonical sequences of proteins, but not in splice variants; (2) PTMs and SAPs can occur in both proteins encoded by canonical sequences and in splice variants; (3) all modification types (AS, SAP, and PTM) occur as independent events. Experimental validation of proteoforms is limited by the analytical sensitivity of proteomic technology. A bell-shaped distribution histogram was generated for proteins encoded by a single chromosome, with the estimation of copy numbers in plasma, liver, and HepG2 cell line. The proposed metabioinformatics approaches can be used for estimation of the number of different proteoforms for any group of protein-coding genes.
The final goal of the Russian part of the Chromosome-centric Human Proteome Project (C-HPP) was established as the analysis of the chromosome 18 (Chr 18) protein complement in plasma, liver tissue and HepG2 cells with the sensitivity of 10(-18) M. Using SRM, we have recently targeted 277 Chr 18 proteins in plasma, liver, and HepG2 cells. On the basis of the results of the survey, the SRM assays were drafted for 250 proteins: 41 proteins were found only in the liver tissue, 82 proteins were specifically detected in depleted plasma, and 127 proteins were mapped in both samples. The targeted analysis of HepG2 cells was carried out for 49 proteins; 41 of them were successfully registered using ordinary SRM and 5 additional proteins were registered using a combination of irreversible binding of proteins on CN-Br Sepharose 4B with SRM. Transcriptome profiling of HepG2 cells performed by RNAseq and RT-PCR has shown a significant correlation (r = 0.78) for 42 gene transcripts. A pilot affinity-based interactome analysis was performed for cytochrome b5 using analytical and preparative optical biosensor fishing followed by MS analysis of the fished proteins. All of the data on the proteome complement of the Chr 18 have been integrated into our gene-centric knowledgebase ( www.kb18.ru ).
We report the results obtained in 2012-2013 by the Russian Consortium for the Chromosome-centric Human Proteome Project (C-HPP). The main scope of this work was the transcriptome profiling of genes on human chromosome 18 (Chr 18), as well as their encoded proteome, from three types of biomaterials: liver tissue, the hepatocellular carcinoma-derived cell line HepG2, and blood plasma. The transcriptome profiling for liver tissue was independently performed using two RNaseq platforms (SOLiD and Illumina) and also by droplet digital PCR (ddPCR) and quantitative RT-PCR. The proteome profiling of Chr 18 was accomplished by quantitatively measuring protein copy numbers in the three types of biomaterial (the lowest protein concentration measured was 10(-13) M) using selected reaction monitoring (SRM). In total, protein copy numbers were estimated for 228 master proteins, including quantitative data on 164 proteins in plasma, 171 in the HepG2 cell line, and 186 in liver tissue. Most proteins were present in plasma at 10(8) copies/μL, while the median abundance was 10(4) and 10(5) protein copies per cell in HepG2 cells and liver tissue, respectively. In summary, for liver tissue and HepG2 cells a "transcriptoproteome" was produced that reflects the relationship between transcript and protein copy numbers of the genes on Chr 18. The quantitative data acquired by RNaseq, PCR, and SRM were uploaded into the "Update_2013" data set of our knowledgebase (www.kb18.ru) and investigated for linear correlations.
This paper summarizes the recent activities of the Chromosome-Centric Human Proteome Project (C-HPP) consortium, which develops new technologies to identify yet-to-be annotated proteins (termed "missing proteins") in biological samples that lack sufficient experimental evidence at the protein level for confident protein identification. The C-HPP also aims to identify new protein forms that may be caused by genetic variability, post-translational modifications, and alternative splicing. Proteogenomic data integration forms the basis of the C-HPP's activities; therefore, we have summarized some of the key approaches and their roles in the project. We present new analytical technologies that improve the chemical space and lower detection limits coupled to bioinformatics tools and some publicly available resources that can be used to improve data analysis or support the development of analytical assays. Most of this paper's content has been compiled from posters, slides, and discussions presented in the series of C-HPP workshops held during 2014. All data (posters, presentations) used are available at the C-HPP Wiki (http://c-hpp.webhosting.rug.nl/) and in the Supporting Information.
Virtual and experimental 2DE coupled with ESI LC-MS/MS was introduced to obtain better representation of the information about human proteome. The proteins from HEPG2 cells and human blood plasma were run by 2DE. After staining and protein spot identification by MALDI-TOF MS, the protein maps were generated. The experimental physicochemical parameters (pI/Mw) of the proteoforms further detected by ESI LC-MS/MS in these spots were obtained. Next, the theoretical pI and Mw of identified proteins were calculated using program Compute pI/Mw (http://web.expasy.org/compute_pi/pi_tool-doc.html). Accordingly, the relationship between theoretical and experimental parameters was analyzed, and the correlation plots were built. Additionally, virtual/experimental information about different protein species/proteoforms from the same genes was extracted. As it was revealed from the plots, the major proteoforms detected in HepG2 cell line have pI/Mw parameters similar to theoretical values. In opposite, the minor protein species have mainly very different from theoretical pI and Mw parameters. A similar situation was observed in plasma in much higher degree. It means that minor protein species are heavily modified in cell and even more in plasma proteome.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.