De novo assembly of RNA-Seq data allows us to study transcriptomes without the need for a genome sequence, such as in non-model organisms of ecological and evolutionary importance, cancer samples, or the microbiome. In this protocol, we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-Seq data in non-model organisms. We also present Trinity’s supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples, and approaches to identify protein coding genes. In an included tutorial we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sf.net.
A full description of the human proteome relies on the challenging task of detecting mature and changing forms of protein molecules in the body. Large scale proteome analysis1 has routinely involved digesting intact proteins followed by inferred protein identification using mass spectrometry (MS)2. This “bottom up” process affords a high number of identifications (not always unique to a single gene). However, complications arise from incomplete or ambiguous2 characterization of alternative splice forms, diverse modifications (e.g., acetylation and methylation), and endogenous protein cleavages, especially when combinations of these create complex patterns of intact protein isoforms and species3. “Top down” interrogation of whole proteins can overcome these problems for individual proteins4,5, but has not been achieved on a proteome scale due to the lack of intact protein fractionation methods that are well integrated with tandem MS. Here we show, using a new four dimensional (4D) separation system, identification of 1,043 gene products from human cells that are dispersed into >3,000 protein species created by post-translational modification, RNA splicing, and proteolysis. The overall system produced >20-fold increases in both separation power and proteome coverage, enabling the identification of proteins up to 105 kilodaltons and those with up to 11 transmembrane helices. Many previously undetected isoforms of endogenous human proteins were mapped, including changes in multiply-modified species in response to accelerated cellular aging (senescence) induced by DNA damage. Integrated with the latest version of the Swiss-Prot database6, the data provide precise correlations to individual genes and proof-of-concept for large scale interrogation of whole protein molecules. The technology promises to improve the link between proteomics data and complex phenotypes in basic biology and disease research7.
ProSight PTM 2.0 (http://prosightptm2.scs.uiuc.edu) is the next generation of the ProSight PTM web-based system for the identification and characterization of proteins using top down tandem mass spectrometry. It introduces an entirely new data-driven interface, integrated Sequence Gazer for protein characterization, support for fixed modifications, terminal modifications and improved support for multiple precursor ions (multiplexing). Furthermore, it supports data import and export for local analysis and collaboration.
Quantitative proteomics has focused heavily on correlating protein abundances, ratios, and dynamics by developing methods that are protein expression-centric (e.g. isotope coded affinity tag, isobaric tag for relative and absolute quantification, etc.). These methods effectively detect changes in protein abundance but fail to provide a comprehensive perspective of the diversity of proteins such as histones, which are regulated by post-translational modifications. Here, we report the characterization of modified forms of HeLa cell histone H4 with a dynamic range >10 4 using a strictly Top Down mass spectrometric approach coupled with two dimensions of liquid chromatography. This enhanced dynamic range enabled the precise characterization and quantitation of 42 forms uniquely modified by combinations of methylation and acetylation, including those with trimethylated Lys-20, monomethylated Arg-3, and the novel dimethylated Arg-3 (each <1% of all H4 forms). Quantitative analyses revealed distinct trends in acetylation site occupancy depending on Lys-20 methylation state. Because both modifications are dynamically regulated through the cell cycle, we simultaneously investigated acetylation and methylation kinetics through three cell cycle phases and used these data to statistically assess the robustness of our quantitative analysis. This work represents the most comprehensive analysis of histone H4 forms present in human cells reported to date.Histones are a class of proteins around which DNA is wrapped and packaged inside a eukaryotic nucleus. Two molecules of each core histone H2A, H2B, H3, and H4 together with ϳ146 bp of DNA form the fundamental unit of chromatin called the nucleosome. These proteins are heavily modified, with combinations of these enzymatic modifications thought to form a "histone code" orchestrating epigenetic processes such as long-term gene silencing and gene activation (1), higher level chromatin packaging (2), and DNA repair mechanisms (3). All of these activities change with relation to the cell cycle, a sequence of events during which a cell commits to DNA replication (G 1 ), replicates its DNA (S), prepares for mitosis (G 2 ), and undergoes cell division (M) (4). Histone synthesis and deposition are largely coupled to DNA replication during S phase (5). As a cell doubles its nuclear DNA, there is a concomitant doubling of the content of histones and nucleosomes. Even though antibodies have been used to track single modifications, the fate of preexisting histone modifications and the acquisition of new histone modifications during the cell cycle is not well understood because this approach is unable to distinguish previously modified forms from newly modified ones (6 -8). However, an epigenetic mechanism presumably exists to faithfully transmit patterns of histone modification and chromatin structure to ensure normal cellular function over successive generations (9).Dynamic changes in the PTMs 4 affecting the N-terminal tails of the core histones, which comprise ϳ25-30% of their individual...
ProSight PTM (https://prosightptm.scs.uiuc.edu/) is a web application for identification and characterization of proteins using mass spectra data from 'top-down' fragmentation of intact protein ions (i.e. without any tryptic digestion). ProSight PTM has many tools and graphical features to facilitate analysis of single proteins, proteins in mixtures and proteins fragmented in parallel. Sequence databases from across the phylogenetic tree are supported, with a new database strategy of 'shotgun annotation' used to assist characterization of wild-type proteins. During a database search, data from divergent sources regarding potential mass differences such as polymorphisms, alternate splicing and post-translational modifications are utilized. The user can optionally control how much of this biological variability should be searched.
A proteoform is a defined form of a protein derived from a given gene with a specific amino acid sequence and localized post‐translational modifications. In top‐down proteomic analyses, proteoforms are identified and quantified through mass spectrometric analysis of intact proteins. Recent technological developments have enabled comprehensive proteoform analyses in complex samples, and an increasing number of laboratories are adopting top‐down proteomic workflows. In this review, some recent advances are outlined and current challenges and future directions for the field are discussed.
Amyloid-beta (Aβ) plays a key role in the pathogenesis of Alzheimer’s disease (AD), but little is known about the proteoforms present in AD brain. We used high-resolution mass spectrometry to analyze intact Aβ from soluble aggregates and insoluble material in brains of six cases with severe dementia and pathologically confirmed AD. The soluble aggregates are especially relevant because they are believed to be the most toxic form of Aβ. We found a diversity of Aβ peptides, with 26 unique proteoforms including various N- and C-terminal truncations. N- and C-terminal truncations comprised 73% and 30%, respectively, of the total Aβ proteoforms detected. The Aβ proteoforms segregated between the soluble and more insoluble aggregates with N-terminal truncations predominating in the insoluble material and C- terminal truncations segregating into the soluble aggregates. In contrast, canonical Aβ comprised the minority of the identified proteoforms (15.3%) and did not distinguish between the soluble and more insoluble aggregates. The relative abundance of many truncated Aβ proteoforms did not correlate with post-mortem interval, suggesting they are not artefacts. This heterogeneity of Aβ proteoforms deepens our understanding of AD and offers many new avenues for investigation into pathological mechanisms of the disease, with implications for therapeutic development.
With the prospect of resolving whole protein molecules into their myriad proteoforms on a proteomic scale, the question of their quantitative analysis in discovery mode comes to the fore. Here, we demonstrate a robust pipeline for the identification and stringent scoring of abundance changes of whole protein forms <30 kDa in a complex system. The input is ∼100–400 μg of total protein for each biological replicate, and the outputs are graphical displays depicting statistical confidence metrics for each proteoform (i.e., a volcano plot and representations of the technical and biological variation). A key part of the pipeline is the hierarchical linear model that is tailored to the original design of the study. Here, we apply this new pipeline to measure the proteoform-level effects of deleting a histone deacetylase (rpd3) in S. cerevisiae. Over 100 proteoform changes were detected above a 5% false positive threshold in WT vs the Δrpd3 mutant, including the validating observation of hyperacetylation of histone H4 and both H2B isoforms. Ultimately, this approach to label-free top down proteomics in discovery mode is a critical technical advance for testing the hypothesis that whole proteoforms can link more tightly to complex phenotypes in cell and disease biology than do peptides created in shotgun proteomics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.