Somatic copy number aberrations (SCNAs) are frequent in cancer genomes, but many of these are random events that do not contribute to the cancer phenotype. A common strategy to distinguish functional, driver events from such random passenger events it to identify recurrent aberrations shared by multiple samples. However, the extensive variability in the length and position of SCNAs across samples makes the problem of identifying recurrent aberrations notoriously difficult.
We developed an algorithm, RAIG (Recurrent Aberrations from Interval Graphs) [1], to identify independent and recurrentSCNAs across a set of samples as maximal cliques in an interval graph constructed from overlaps between aberrations. In contrast to existing approaches that deconvolve a recurrence score of SCNAs derived from individual markers (probes), RAIG analyzes the combinatorial structure of the underlying intervals, and thus explicitly models the dependencies between values of the recurrence score. RAIG uses a dynamic programming algorithm to optimize a rigorous objective function for the selection of recurrent aberrations. RAIG is very efficient, as maximal cliques in an interval graph can be efficiently enumerated. Also, RAIG is readily adaptable for both microarray and high-throughput sequencing data.
We compared RAIG with four existing algorithms: GAIA[2], JISTIC[3], GISTIC[4] and GISTIC2[5] on three simulated data sets, including a simple model of SCNAs with the introduction ofspatial noise, a simulated model for examining the power of detecting secondary events, and a simulated model for demonstrating the power of separating two driver SCNAs that contain different fraction of overlap. The results demonstrate RAIG outperforms other approaches on all three simulated data sets. We used RAIG to perform a Pan-Cancer analysis of SCNAs in 4,976 samples from 12 cancer types from The Cancer Genome Atlas (TCGA) [6]. Significantly recurrent SCNAs were observed in 112 regions, including 61 amplified regions and 51 deleted regions. 58 of these recurrent SCNAs were reported in recently published pan-cancer analysis [7], including amplifications of KRAS, CCND1, CDK4, MDM2, MDM4, FGFR3, PDGFRA, EGFR and MYC; and deletions of PTEN, RB1, NF1, ARID1A, CDKN2A, PTPRD and MLL3. RAIG also identified additional regions with known cancer genes, e.g. amplifications of ERBB3, SMARC4, MECOM and ESR1, and deletions of MAP2K4 and SLIT2. These results demonstrate that RAIG is a useful algorithm for identifying recurrent SCNAs across large patient cohorts.