Matching peptide tandem mass spectra to their cognate amino acid sequences in databases is a key step in proteomics. It is usually performed by assigning a score to a spectrum-sequence combination. De novo sequencing or partial de novo sequencing is useful for organisms without sequenced genome or for peptides with unexpected modifications. Here we use a very large, high accuracy proteomic dataset to investigate how much peptide sequence information is present in tandem mass spectra generated in a linear ion trap (LTQ). More than 400,000 identified tandem mass spectra from a single human cancer cell line project were assigned to 26,896 distinct peptide sequences. The average absolute fragment mass accuracy is 0.102 Da. There are on average about four complementary b-and y-ions; both series are equally represented but y ions are 2-to 3-fold more intense up to mass 1000. Half of all spectra contain uninterrupted b-or y-ion series of at least six amino acids and combining b-and y-ion information yields on average seven amino acid sequences. These sequences are almost always unique in the human proteome, even without using any precursor or peptide sequence tag information. Thus, optimal de novo sequencing algorithms should be able to obtain substantial sequence information in at least half of all cases. , and many others score these MS/MS spectra against in silico digested peptides whose calculated precursor masses fall into a suitable window around the measured mass, leading to statistically significant identification for a fraction of the mass spectrometric sequencing events [5]. In most cases, the proportion of identifiable peptides is quite low for samples of high protein complexity [6]. Despite recent improvements in identification rates [7,8], many MS/MS spectra remain unassigned, even though they are of reasonable quality.The peptide database search approach has the disadvantage that it is blind towards the unexpected: only peptides that result from the digestion of known protein sequences, possibly having a few missed cleavages and a very limited number of standard variable modifications, can be identified in this way. The sequence tag approach [9] is an alternative to the conventional peptide database search that does not suffer from these limitations. Instead of operating in the restricted space of in silico digestions of known protein sequences, one starts by looking for a series of peaks that correspond to consecutive members of a fragment series. Each of the mass differences between two neighboring peaks must be equal to one of the 20 amino acid masses. Much of the specificity of a sequence tag in database searches comes from the mass information encoded in the two flanking masses. In this way, even a tag of two or three amino acids is usually unique in the proteome, especially given the very high precursor mass accuracy possible with modern, high-resolution mass spectrometers. A tag sequence that is part of an in silico peptide but with a wrong parent mass points to a novel and potentially interesting ...