An integrated map of structural variation in 2,504 human genomes

Sudmant, Peter H.; Rausch, Tobias; Gardner, Eugene J.; Handsaker, Robert E.; Abyzov, Alexej; Huddleston, John; Zhang, Yan; Ye, Kai; Jun, Goo; Fritz, Markus H.; Konkel, Miriam K.; Malhotra, Ankit; Stütz, Adrian M.; Shi, Xinghua; Casale, Francesco Paolo; Chen, Jieming; Hormozdiari, Fereydoun; Dayama, Gargi; Chen, Ken; Malig, Maika; Chaisson, Mark; Walter, Klaudia; Meiers, Sascha; Kashin, Seva; Garrison, Erik; Auton, Adam; Lam, Hugo Y. K.; Mu, Xinmeng Jasmine; Alkan, Can; Antaki, Danny; Bae, Taejeong; Cerveira, Eliza; Chines, Peter S.; Chong, Zechen; Clarke, Laura; Dal, Elif; Li, Ding; Emery, Sarah B.; Fan, Xian; Gujral, Madhusudan; Kahveci, Fatma; Kidd, Jeffrey M.; Kong, Yu; Lameijer, Eric Wubbo; McCarthy, Shane; Flicek, Paul; Gibbs, Richard A.; Marth, Gábor; Mason, Christopher E.; Menelaou, Androniki; Muzny, Donna M.; Nelson, Bradley J.; Noor, Amina; Parrish, Nicholas F.; Pendleton, Matthew; Quitadamo, Andrew; Raeder, Benjamin; Schadt, Eric E.; Romanovitch, Mallory; Schlattl, Andreas; Sebra, Robert; Shabalin, Andrey A.; Untergasser, Andreas; Walker, Jerilyn A.; Wang, Min; Yu, Fuli; Zhang, Chengsheng; Zhang, Jing; Zheng-Bradley, Xiangqun; Zhou, Wanding; Zichner, Thomas; Sebat, Jonathan; Batzer, Mark A.; McCarroll, Steven A.; Mills, Ryan E.; Gerstein, Mark; Bashir, Ali Kashif; Stegle, Oliver; Devine, Scott E.; Lee, Charles; Eichler, Evan E.; Korbel, Jan O.

doi:10.1038/nature15394

Cited by 2,015 publications

(2,396 citation statements)

References 40 publications

Supporting

114

Mentioning

2,264

Contrasting

Unclassified

Order By: Relevance

“…Compared to array-based data, which commonly serve as inputs for copy-number significance analysis, sequencing-based copy-number profiles are more prone to artefact copy-number variations, for example, due to repetitive regions leading to ambiguous alignments. Thus, several filtering steps were used to eliminate false-positive GISTIC peak calls and to discover potentially cancerrelevant copy-number alterations: first, peaks overlapping with common fragile genomic sites were excluded, as these are likely to be consequences of genomic instability rather than cancer-driving events 97 ; next, peaks overlapping within 1 Mb of chromosomal ends were removed, as here sequencing coverage tends to vary frequently; and last, peaks overlapping with copy-number variable regions 98 (regions ranked 1-100) were excluded. Additionally, some of the resulting peaks were classified as 'passengers' of variable regions that were called as separated peaks from most likely one event, for example, a peak with MYCNOS as passenger peak of MYCN amplification.…”

Section: Discussionmentioning

confidence: 99%

The landscape of genomic alterations across childhood cancers

Gröbner¹,

Worst²,

Weischenfeldt³

et al. 2018

Nature

Self Cite

1,081

1,010

View full text Add to dashboard Cite

The landscape of genomic alterations across childhood cancers a list of authors and affiliations appears at the end of the paper. OPENPan-cancer analyses that examine commonalities and differences among various cancer types have emerged as a powerful way to obtain novel insights into cancer biology. Here we present a comprehensive analysis of genetic alterations in a pan-cancer cohort including 961 tumours from children, adolescents, and young adults, comprising 24 distinct molecular types of cancer. Using a standardized workflow, we identified marked differences in terms of mutation frequency and significantly mutated genes in comparison to previously analysed adult cancers. Genetic alterations in 149 putative cancer driver genes separate the tumours into two classes: small mutation and structural/copy-number variant (correlating with germline variants). Structural variants, hyperdiploidy, and chromothripsis are linked to TP53 mutation status and mutational signatures. Our data suggest that 7-8% of the children in this cohort carry an unambiguous predisposing germline variant and that nearly 50% of paediatric neoplasms harbour a potentially druggable event, which is highly relevant for the design of future clinical trials.Cure rates for childhood cancers have increased to about 80% in recent decades, but cancer is still the leading cause of death by disease in the developed world among children over one year of age 1,2 . Furthermore, many children who survive cancer suffer from long-term sequelae of surgery, cytotoxic chemotherapy, and radiotherapy, including mental disabilities, organ toxicities, and secondary cancers 3 . A crucial step in developing more specific and less damaging therapies is the unravelling of the complete genetic repertoire of paediatric malignancies, which differ from adult malignancies in terms of their histopathological entities and molecular subtypes 4 . Over the past few years, many entityspecific sequencing efforts have been launched, but the few paediatric pan-cancer studies thus far have focused only on mutation frequencies, germline predisposition, and alterations in epigenetic regulators [4][5][6] .We have carried out a broad exploration of cancers in children, adolescents, and young adults, by incorporating small mutations and copy-number or structural variants on somatic and germline levels, and by identifying putative cancer genes and comparing them to those previously reported in adult cancers by The Cancer Genome Atlas (TCGA) 7 . We have also examined mutational signatures and potential drug targets. The compendium of genetic alterations presented here is available to the scientific community at http://www.pedpancan.com.This integrative analysis includes 24 types of cancer and covers all major childhood cancer entities, many of which occur exclusively in children 8 (Fig. 1, Supplementary Table 1). Ninety-five per cent of the patients in this study were diagnosed during childhood or adolescence (aged 18 years or younger) and 5% as young adults (up to 25 years) (Extended Data ...

show abstract

Section: Discussionmentioning

confidence: 99%

The landscape of genomic alterations across childhood cancers

Gröbner¹,

Worst²,

Weischenfeldt³

et al. 2018

Nature

Self Cite

1,081

1,010

View full text Add to dashboard Cite

show abstract

“…We were able to validate 271 out of 276 SVs with BAC contigs generated by SMRT sequencing (Supplementary Table 12). Compared to previous studies 6,[8][9][10][11] , a total of 11,927 variants were previously unreported, which account for approximately 47% (3,465) and 76% (7,710) of all deletions and insertions, respectively ( Fig. 2a and Extended Data Fig.…”

mentioning

confidence: 62%

De novo assembly and phasing of a Korean human genome

Seo

Rhie

Kim

et al. 2016

Nature

301

294

View full text Add to dashboard Cite

Although massively parallel sequencing approaches have been widely used to study genomic variation, simple alignment of short reads to a reference genome cannot be used to investigate the full range of structural variation and phased diploid architecture, which are important for precision medicine. By contrast, the single-molecule real-time (SMRT) sequencing platform produces long reads that can resolve repetitive structures effectively. We integrated this technology with several other sequencing approaches to construct a high-quality

show abstract

“…Studies of L1s in the whole genome sequencing data of phase 1 of the 1000 genomes project have been published (Ewing & Kazazian, 2011; Stewart et al., 2011), and the L1s detected in these publications were annotated as known non‐reference L1s at the time of our L1‐seq analyses. Subsequently, in response to reviewer comments, we cross‐referenced our list of detected L1s with the 2015 publication on structural variants in the 1000 genomes project (Sudmant et al., 2015). Although the SYBU , DAB1, KLHL1, TBCK, PTHR2, and MACROD2 L1s were found in the 1000 genomes project, the TET2, WBSCR17, ATXN1, CTCF, DDX58, and DACH2 L1s confirmed in our study were not among those in the phase 3 data of the 1000 genomes project.…”

Section: Resultsmentioning

confidence: 99%

“…Thus, 100% of neurons and glia, heterozygous for this novel L1, are likely affected. The SYBU L1 was not detected in gDNA samples from the blood of 84 individuals of European or African descent (data not shown), but was subsequently found in the phase 3 dataset of structural variants of the 1000 genomes project (Sudmant et al., 2015) at very low minor allele frequencies (≤1%) in 2 African (GDW, MSL) and 1 European (IBS) population(s). In contrast, the TET2 and WBSCR17 L1s may be private mutations because they were found in one individual and were not found among the L1s in the phase 3 data set of the 1000 genomes project (Sudmant et al., 2015).…”

Section: Discussionmentioning

confidence: 99%

“…The SYBU L1 was not detected in gDNA samples from the blood of 84 individuals of European or African descent (data not shown), but was subsequently found in the phase 3 dataset of structural variants of the 1000 genomes project (Sudmant et al., 2015) at very low minor allele frequencies (≤1%) in 2 African (GDW, MSL) and 1 European (IBS) population(s). In contrast, the TET2 and WBSCR17 L1s may be private mutations because they were found in one individual and were not found among the L1s in the phase 3 data set of the 1000 genomes project (Sudmant et al., 2015). These data, suggesting rare germline L1 variants ( SYBU ) or potentially private L1s ( TET2 and WBSCR17 ), are consistent with the hypothesis that polymorphic germline or early developmental de novo somatic L1s might be risk factors predisposing an individual to developing CA.…”

Section: Discussionmentioning

confidence: 99%

See 1 more Smart Citation

Reading LINEs within the cocaine addicted brain

Doyle

Doucet-O’Hare²,

Hammond

et al. 2017

Brain and Behavior

View full text Add to dashboard Cite

IntroductionLong interspersed element (LINE)‐1 (L1) is a type of retrotransposon capable of mobilizing into new genomic locations. Often studied in Mendelian diseases or cancer, L1s may also cause somatic mutation in the developing central nervous system. Recent reports showed L1 transcription was activated in brains of cocaine‐treated mice, and L1 retrotransposition was increased in cocaine‐treated neuronal cell cultures. We hypothesized that the predisposition to cocaine addiction may result from inherited L1s or somatic L1 mobilization in the brain.MethodsPostmortem medial prefrontal cortex (mPFC) tissue from 30 CA and 30 control individuals was studied. An Alexafluor488‐labeled NeuN antibody and fluorescence activated nuclei sorting were used to separate neuronal from non‐neuronal cell nuclei. L1s and their 3' flanking sequences were amplified from neuronal and non‐neuronal genomic DNA (gDNA) using L1‐seq. L1 DNA libraries from the neuronal gDNA were sequenced on an Illumina HiSeq2000. Sequences aligned to the hg19 human genome build were analyzed for L1 insertions using custom “L1‐seq” bioinformatics programs.ResultsPreviously uncataloged L1 insertions, some validated by PCR, were detected in neurons from both CA and control brain samples. Steady‐state L1 mRNA levels in CA and control mPFC were also assessed. Gene ontology and pathway analyses were used to assess relationships between genes putatively disrupted by novel L1s in CA and control individuals. L1 insertions in CA samples were enriched in gene ontologies and pathways previously associated with CA.ConclusionsWe conclude that neurons in the mPFC harbor L1 insertions that have the potential to influence predisposition to CA.

show abstract

An integrated map of structural variation in 2,504 human genomes

Cited by 2,015 publications

References 40 publications

The landscape of genomic alterations across childhood cancers

The landscape of genomic alterations across childhood cancers

De novo assembly and phasing of a Korean human genome

Reading LINEs within the cocaine addicted brain

Contact Info

Product

Resources

About