“…It follows naturally that the ideal unified coordinate system
for proteogenomics should remain genomic in nature. Indeed, effective
tools that can map MS-based proteomics results onto genomic coordinates
have recently become available (Peppy, 2 Proteogenomic Mapping Tool, 3 Pepline, 4 MS-Dictionary, 5 GappedDictionary, 6 IggyPep, 7 MSProGene, 8 ProteoAnnotator, 9 PGNexus, 10 and GalaxyP 11 ); however,
these tools are usually couched in a relatively involved and comprehensive
pipeline (e.g., the GalaxyP pipeline consists of up to 140 steps)
and typically impose a specific mass-informatic 12 workflow on the practitioner, by, for example, requiring
the generation of short peptide sequence tags (PSTs) or some complex
form of de novo peptide sequencing followed by a lookup against the
full six-frame translation of the genomic sequence. Our experience
suggests that a more common scenario involves the production, by the
genomic arm of the workflow, of a (liberally) predicted proteome (containing
what is assumed to be a superset of the observable proteome) so as
to leverage existing PSM search engines (such as Mascot, 13 Sequest, 14 X!Tandem 15 ) that require a straightforward representation
of the predicted proteome (in the form of a FASTA file).…”