2020
DOI: 10.1021/acs.jproteome.9b00566
|View full text |Cite
|
Sign up to set email alerts
|

EPIFANY: A Method for Efficient High-Confidence Protein Inference

Abstract: Accurate protein inference under the presence of shared peptides is still one of the key problems in bottom-up proteomics. Most protein inference tools employing simple heuristic inference strategies are efficient, but exhibit reduced accuracy. More advanced probabilistic methods often exhibit better inference quality but tend to be too slow for large data sets.

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
45
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
9
1

Relationship

0
10

Authors

Journals

citations
Cited by 21 publications
(45 citation statements)
references
References 36 publications
0
45
0
Order By: Relevance
“…Another improvement to the pipeline will be an evolution of the heuristically driven “Rescue & Resolve” approach. We plan to develop a probabilistic protein inference algorithm in which transcriptional abundance values are incorporated into a rigorous statistical framework for the inference of protein isoforms [ 43 , 60 ]. The applications of our computational pipeline could also include the analysis of novel genes or genetic variation that is detectable in long-read data or separately available from previous genotyping, use of ONT (i.e., nanopore) cDNA or direct RNA sequencing data [ 54 ], the analysis of single-cell RNA-seq, use of targeted long-read datasets [ 61 ], or the use of top-down proteomics data for the analysis of proteoform diversity [ 62 ].…”
Section: Discussionmentioning
confidence: 99%
“…Another improvement to the pipeline will be an evolution of the heuristically driven “Rescue & Resolve” approach. We plan to develop a probabilistic protein inference algorithm in which transcriptional abundance values are incorporated into a rigorous statistical framework for the inference of protein isoforms [ 43 , 60 ]. The applications of our computational pipeline could also include the analysis of novel genes or genetic variation that is detectable in long-read data or separately available from previous genotyping, use of ONT (i.e., nanopore) cDNA or direct RNA sequencing data [ 54 ], the analysis of single-cell RNA-seq, use of targeted long-read datasets [ 61 ], or the use of top-down proteomics data for the analysis of proteoform diversity [ 62 ].…”
Section: Discussionmentioning
confidence: 99%
“…Briefly, raw.d files generated from the timsTOF Pro were converted to mzML with OpenMS (v 2.5.0) and tdf2mzml in-house nextflow script. Converted mzML files were searched against a reviewed UniProt human proteome (downloaded 1/1/2020) in OpenMS with the following protein search engine and inference engine combination: Comet fido, Comet epiphany, X!Tandem epiphany, MSGF+ fido, and MSGF+ epiphany [19][20][21][22]. Search parameters included precursor mass tolerance 20 ppm, MS2 mass tolerance 0.05 Da, carbamidomethylation of cysteine as a fixed modification, oxidation of methionine as a variable modification and false discovery rate (FDR) of peptide and proteins equal to 0.05.…”
Section: Protein Identification Data Analysis and Statistics For Prot...mentioning
confidence: 99%
“…Peptide-level FDR was estimated with a conventional target-decoy approach; the target-decoy FASTA database supplied to MS-GF+ was constructed from the set of verified UniProt sequences by concatenation of the reversed sequences. Proteins were subsequently assembled directly from the set of peptide identifications, or alternatively inferred using the Percolator 49 , Fido 50 or EPIFANY 51 engines. For EPIFANY inference, peptides were first rescored by Percolator.…”
Section: Raw Proteomic Data Processingmentioning
confidence: 99%