2023
DOI: 10.1021/acs.jproteome.3c00054
|View full text |Cite
|
Sign up to set email alerts
|

OpenCustomDB: Integration of Unannotated Open Reading Frames and Genetic Variants to Generate More Comprehensive Customized Protein Databases

Abstract: Proteomic diversity in biological samples can be characterized by mass spectrometry (MS)-based proteomics using customized protein databases generated from sets of transcripts previously detected by RNA-seq. This diversity has only been increased by the recent discovery that many translated alternative open reading frames rest unannotated at unsuspected locations of mRNAs and ncRNAs. These novel protein products, termed alternative proteins, have been left out of all previous custom database generation tools. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
5
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 33 publications
0
5
0
Order By: Relevance
“…However, these approaches suffer from their dedication as the identified MAPs could also derive from other transcripts absent from these databases. Accordingly, based on evidence showing that greater RNA expression confers a greater probability of MAP generation [ 7 , 13 ], we implemented a biotype annotation tool in BamQuery and showed that many presumed ncMAPs could be coded with greater probability by regions annotated with different biotypes. Notably, cryptic proteins are translated as efficiently as canonical proteins and generate MAPs fivefold more efficiently per translation event [ 5 ].…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…However, these approaches suffer from their dedication as the identified MAPs could also derive from other transcripts absent from these databases. Accordingly, based on evidence showing that greater RNA expression confers a greater probability of MAP generation [ 7 , 13 ], we implemented a biotype annotation tool in BamQuery and showed that many presumed ncMAPs could be coded with greater probability by regions annotated with different biotypes. Notably, cryptic proteins are translated as efficiently as canonical proteins and generate MAPs fivefold more efficiently per translation event [ 5 ].…”
Section: Discussionmentioning
confidence: 99%
“…Indeed, these studies revealed that ~ 5–10% of MAPs derive from non-canonical (nc) regions of the genome, such as introns, non-coding RNAs (ncRNA), or endogenous retroelements (EREs), as well as from out-of-frame exonic translation [ 3 6 ]. Furthermore, a recent study showed that a significant fraction of MS peptide-spectrum matches assigned to canonical MAPs have better scores when attributed to ncMAPs, suggesting a greater contribution of the non-canonical regions to the immunopeptidome than previously estimated [ 7 ]. While most of the discovered ncMAPs are non-mutated [ 4 , 8 12 ], many of them are found exclusively in cancer cells and attract attention as (1) they can be immunogenic in vitro as well as in vivo; (2) they are more numerous in the immunopeptidome of malignant cells than mutated TAs, and (3) several non-coding TAs are widely shared between cancer patients whereas mutations mainly generate private antigens [ 13 , 14 ].…”
Section: Introductionmentioning
confidence: 99%
“… 166–168 This number could be underestimated as some MS spectra originally assigned to canonical proteins could indeed correspond to non-canonical ones. 169 In addition, the total number of human ncORFs is yet unclear. While conservative estimates consider around 7,000, others find certain evidence for the existence for several hundred thousand.…”
Section: Non-classical Tumor Antigensmentioning
confidence: 99%
“…Guilloy et al point out the incompleteness of conventional protein sequence databases that only contain translations of canonical open reading frames. They offer a computational solution, OpenCustomDB (), to account for alternative open reading frames using RNA-Seq data specific to each patient sample. Finally, Li et al have developed and made publicly available GlycoSLASH (), a computational pipeline that goes beyond the capabilities of the commercial Byonic software.…”
mentioning
confidence: 99%
“…Ryan et al describe the robust optimization of a clinical analytical pipeline to quantify over 1000 lipid species from plasma and serum using internal standards labeled with stable isotopes (SIL-IS) and targeted mass spectrometry . Finally, two of the computational pipelines described above for deep proteomics were applied to patient samples and shown to significantly improve our ability to detect genomic variants and glycosylated biomarkers . Such technical improvements on both the LC-MS side and the computational analysis of the resulting data sets have the potential to support personalized medicine on a large-scale.…”
mentioning
confidence: 99%