Cancer mutation databases are expected to play central roles in personalized medicine by providing targets for drug development and biomarkers to tailor treatments to each patient. The accuracy of reported mutations is a critical issue that is commonly overlooked, which leads to mutation databases that include a sizable number of spurious mutations, either sequencing errors or passenger mutations. Here we report an analysis of the latest version of the TP53 mutation database, including 34,453 mutations. By using several data-driven methods on multiple independent quality criteria, we obtained a quality score for each report contributing to the database. This score can now be used to filter for high-confidence mutations and reports within the database. Sequencing the entire TP53 gene from various types of cancer using next-generation sequencing with ultradeep coverage validated our approach for curation. In summary, 9.7% of all collected studies, mostly comprising numerous tumors with multiple infrequent TP53 mutations, should be excluded when analyzing TP53 mutations. Thus, by combining statistical and experimental analyses, we provide a curated mutation database for TP53 mutations and a framework for mutation database analysis.cancer genetics | genomic | locus-specific database C onventional sequencing using Sanger's methodology has allowed for the discovery of genetic alterations in cancer genes (1). Next-generation sequencing (NGS) techniques have expanded this knowledge by providing a more complete description of each type of alteration, including copy-number variations, translocations, and missense mutations (2, 3). The majority of these mutations are passenger mutations (or hitchhiking mutations) that have no active role in cancer progression and are only coselected with the driver mutations (4).Since the first publication on TP53 mutations in 1989, more than 2,700 articles have been published describing more than 35,000 TP53 mutations in various tumor types and cell lines (5, 6). TP53 mutation studies have applied a variety of analyses, including molecular epidemiology, clinical surveys, and structural analyses (7,8). Such studies require highly curated TP53 mutation data from the Locus Specific Database (LSDB) established and maintained since 1989 (9, 10).The unique feature of TP53 compared with other tumor-suppressor genes is its mode of inactivation. Although most tumorsuppressor genes are inactivated by mutations, leading to absence of the protein (or synthesis of a truncated product), more than 80% of TP53 alterations are missense mutations encoding a stable full-length protein (11). Moreover, each tumor generally harbors a single mutation in the TP53 gene that reduces the transactivation activity of the TP53 protein, leading to loss of its antiproliferative and proapoptotic properties.Previous studies have raised concerns about the accuracy of the various TP53 databases, because they include all mutations published in peer-reviewed journals (12)(13)(14). Statistical analysis showed that the use of n...