2020
DOI: 10.1101/2020.12.11.419523
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Personalized Proteome: Comparing Proteogenomics and Open Variant Search Approaches for Single Amino Acid Variant Detection

Abstract: Discovery of variant peptides such as single amino acid variant (SAAV) in shotgun proteomics data is essential for personalized proteomics. Both the resolution of shotgun proteomics methods and the search engines have improved dramatically, allowing for confident identification of SAAV peptides. However, it is not yet known if these methods are truly successful in accurately identifying SAAV peptides without prior genomic information in the search database. We studied this in unprecedented detail by exploiting… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 66 publications
0
4
0
Order By: Relevance
“…60 We argue that this is an all but impossible threshold to meet for bioinformatics analyses that rely on matching experimental data to a database of already known sequencesa process that biases results toward conserved peptides and can fail to identify novel sequences even when they are present. 61 Additionally, confident identification of peptides with even a single amino acid variant from the search database remains quite difficult, 61 even in modern tissues, 71 and would likely be more difficult to assess for taxa in deep time. The use of more advanced peptide spectral match (PSM) quality metrics than a basic 1% false discovery rate (FDR) (e.g., posterior error probability [PEP] or more strict FDR values) may be necessary for variant peptides or rare peptides to support their detection when species-specific peptides are not found.…”
Section: Authentication Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…60 We argue that this is an all but impossible threshold to meet for bioinformatics analyses that rely on matching experimental data to a database of already known sequencesa process that biases results toward conserved peptides and can fail to identify novel sequences even when they are present. 61 Additionally, confident identification of peptides with even a single amino acid variant from the search database remains quite difficult, 61 even in modern tissues, 71 and would likely be more difficult to assess for taxa in deep time. The use of more advanced peptide spectral match (PSM) quality metrics than a basic 1% false discovery rate (FDR) (e.g., posterior error probability [PEP] or more strict FDR values) may be necessary for variant peptides or rare peptides to support their detection when species-specific peptides are not found.…”
Section: Authentication Methodsmentioning
confidence: 99%
“…As a result, new bioinformatic tools or add-ons to current tools will be necessary to maximize detection and confidence in identification of sequences that vary from search databases. 61 The difficulties in detecting and controlling false discovery of these variants 71 makes finding changes specific to extinct species very difficult, but the ability to do so is a crucial goal of paleoproteomics as a discipline. Thus, although such variations affect all proteomic disciplines that rely on database searching, the field of paleoproteomics must take a central role in overcoming these challenges.…”
Section: Dtpp Can Embrace Technological Advancementsmentioning
confidence: 99%
“…For these incorrect interpretations one could expect bimodal distributions that contain a combination of right and wrongly assigned mass shifts, but the error distributions are mostly unimodal. The observation that presumed mutations are among the most problematic corresponds to the known non-trivial nature of reliably identifying such sequence changes 26,30 .…”
Section: Deeplc Was Applied On the Results Of An Open Modification Sementioning
confidence: 99%
“…44−46 Cell-or tissue-specific databases built from long RNA-seq read data will be of particular use, helping to detect sample-specific peptides and thus protein isoforms. 47 Custom protein sequence databases built from RNA-seq data typically contain more sequences than traditional reference databases due to the translation of assembled and variant transcripts in more than one frame. 9 However, increasing the number of candidate protein sequences can affect the statistical validity of peptide matches.…”
Section: ■ Discussionmentioning
confidence: 99%