12Thousands of protein post-translational modifications (PTMs) dynamically impact nearly all 13 cellular functions. Mass spectrometry is well suited to PTM identification, but proteome-scale 14 analyses are biased towards PTMs with existing enrichment methods. To measure the full 15 landscape of PTM regulation, software must overcome two fundamental challenges: intractably 16 large search spaces and difficulty distinguishing correct from incorrect identifications. Here, we 17 describe TagGraph, software that overcomes both challenges with a string-based search 18 method orders of magnitude faster than current approaches, and probabilistic validation model 19 optimized for PTM assignments. When applied to a human proteome map, TagGraph tripled 20 confident identifications while revealing thousands of modification types on nearly one million 21 sites spanning the proteome. We expand known sites by orders of magnitude for highly 22 abundant yet understudied PTMs such as proline hydroxylation, and derive tissue-specific 23 insight into these PTMs' roles. TagGraph expands our ability to survey the full landscape of 24 PTM function and regulation. 25Conventional sequence database search tools cannot identify modified peptides unless they are 44 first anticipated by the researcher [20][21][22] . Search parameters including the number, kind, and 45 frequency of PTMs are usually chosen to strike a difficult compromise: considering larger 46 numbers of PTMs and other sequence variants is necessary for their identification, but doing so 47 exponentially increases the time needed to interpret MS/MS datasets, and decreases the ability 48 to distinguish correct from incorrect assignments 23 . To partially address this compromise, 49 strategies have been proposed to constrain the number of proteins being searched, protease 50 specificity rules, or the allowable types and numbers of PTMs 17,18,[24][25][26] . In practice, these 51 approaches only marginally decrease search times without clearly distinguishing correct from 52 incorrect PTM assignments 27 . Therefore, most have not been demonstrated on large, proteome-53 scale datasets 23 54Here, we describe TagGraph, a powerful computational tool that addresses two principle 55 challenges of searching very large sequence spaces. First, TagGraph leverages accurate de 56 novo mass spectrum interpretations 28,29 to rapidly search millions of possible sequences for a 57 match with an FM-index 30 data structure. This highly efficient search method makes modern 58 next-generation genome sequencing possible 31 , but has not been adapted to proteomics. By 59 combining it with a graph-based string reconciliation algorithm, TagGraph rapidly searches 60 MS/MS datasets without restrictions on number of proteins, PTMs, or protease specificity. This 61 strategy achieves speeds orders of magnitude faster than prior algorithms because it considers 62 exponentially more sequence possibilities without having to explicitly test each one against input 63 spectra. Second, by replacing conventional "ta...