Abstract:Here we present GRIDSS2, a general purpose structural variant caller optimised for tumour/normal somatic calling. Using cell line, patient sample validation and cohort-level comparisons, we show GRIDSS2 outperforms recent state-of-the-art tools. We demonstrate GRIDSS2 retains high sensitivity and precision even for small events by identifying a small (32-100bp) duplication signature strongly associated with colorectal cancer using 3,782 metastatic cancers that have been deeply sequenced by the Hartwig … Show more
“…Our results indicate that these missing events typically involve centromeric regions that are not directly accessible by any current sequencing technology. Annotation data provided by the GRIDSS2 SV caller (Cameron et al, 2020) .…”
Section: Discussionmentioning
confidence: 99%
“…Reads were mapped to GRCh37 with BWA mem (version 0.7.5, (Li, 2013) ), followed by indel realignment with GATK (v3.4-46, (DePristo et al, 2011) ). SVs were called jointly for COLO829 and COLO829BL with GRIDSS (v2.0.1, (Cameron et al, 2020) ). Somatics SVs were filtered with the GRIDSS somatic SV filtering script (https://github.com/PapenfussLab/gridss/blob/master/scripts/gridss_somatic_filter.R).…”
Section: Genomic Analyses Per Technologymentioning
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality gold standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines for comprehensive somatic SV detection. Here, we approached this challenge by genome-wide somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different technologies: Illumina HiSeq, Oxford Nanopore, Pacific Biosciences and 10x Genomics. Based on the evidence from multiple technologies combined with extensive experimental validation, including Bionano optical mapping data and targeted detection of candidate breakpoint junctions, we compiled a comprehensive set of true somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance of each technology as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects and data analysis tool evaluation. The reference truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
“…Our results indicate that these missing events typically involve centromeric regions that are not directly accessible by any current sequencing technology. Annotation data provided by the GRIDSS2 SV caller (Cameron et al, 2020) .…”
Section: Discussionmentioning
confidence: 99%
“…Reads were mapped to GRCh37 with BWA mem (version 0.7.5, (Li, 2013) ), followed by indel realignment with GATK (v3.4-46, (DePristo et al, 2011) ). SVs were called jointly for COLO829 and COLO829BL with GRIDSS (v2.0.1, (Cameron et al, 2020) ). Somatics SVs were filtered with the GRIDSS somatic SV filtering script (https://github.com/PapenfussLab/gridss/blob/master/scripts/gridss_somatic_filter.R).…”
Section: Genomic Analyses Per Technologymentioning
Accurate detection of somatic structural variation (SV) in cancer genomes remains a challenging problem. This is in part due to the lack of high-quality gold standard datasets that enable the benchmarking of experimental approaches and bioinformatic analysis pipelines for comprehensive somatic SV detection. Here, we approached this challenge by genome-wide somatic SV analysis of the paired melanoma and normal lymphoblastoid COLO829 cell lines using four different technologies: Illumina HiSeq, Oxford Nanopore, Pacific Biosciences and 10x Genomics. Based on the evidence from multiple technologies combined with extensive experimental validation, including Bionano optical mapping data and targeted detection of candidate breakpoint junctions, we compiled a comprehensive set of true somatic SVs, comprising all SV types. We demonstrate the utility of this resource by determining the SV detection performance of each technology as a function of tumor purity and sequence depth, highlighting the importance of assessing these parameters in cancer genomics projects and data analysis tool evaluation. The reference truth somatic SV dataset as well as the underlying raw multi-platform sequencing data are freely available and are an important resource for community somatic benchmarking efforts.
“…Along with the manually-curated reference set, the panel of normal (PON) used for further filtering was generated from a compiled set of high-quality germline calls using 3,782 normal samples freshly-sequenced at a median depth of 38x by the Hartwig Medical Foundation. 49,50 Sniphles: The main idea is to phase the identified SVs. We use two approaches; the first is to extract the tagged reads from the bam file and use these reads to phase the SVs if not conflicted.…”
In October 2020, 62 scientists from nine nations worked together remotely in the Second Baylor College of Medicine & DNAnexus hackathon, focusing on different related topics on Structural Variation, Pan-genomes, and SARS-CoV-2 related research. The overarching focus was to assess the current status of the field and identify the remaining challenges. Furthermore, how to combine the strengths of the different interests to drive research and method development forward. Over the four days, eight groups each designed and developed new open-source methods to improve the identification and analysis of variations among species, including humans and SARS-CoV-2. These included improvements in SV calling, genotyping, annotations and filtering. Together with advancements in benchmarking existing methods. Furthermore, groups focused on the diversity of SARS-CoV-2. Daily discussion summary and methods are available publicly at https://github.com/collaborativebioinformatics provides valuable insights for both participants and the research community.
“…SNVs called with bcftools and the viral reference modified to incorporate these SNVs. Viral read pairs are realigned to the updated reference and structural variants called using GRIDSS2 17 and filtered to single breakends. Single breakends are breakpoints in which one side cannot be unambiguously aligned to the (viral) reference.…”
Section: Virusbreakend Overviewmentioning
confidence: 99%
“…As input, it uses a SAM/BAM/CRAM file of reads aligned to the host reference genome. Viral reads are classified using Kraken216 , aligned to the most abundant host-infecting virus with bwa26 , realigned to a modified viral reference that incorporates SNVs called by bcftools, single breakends identified with GRIDSS217,27 , aligned to the host to identify putative integration sites, and annotated with RepeatMasker to identify false positive and multi-mapping integration sites.All read sequences 20bp or longer that are not aligned to the host reference genome are classified using a Kraken2 database containing the human, viral and UniVec_Core sequences.For soft clipped reads, only the unaligned bases are classified. For split read alignments, only the bases not aligned to either location are classified.…”
Integration of viruses into infected host cell DNA can causes DNA damage and can disrupt genes. Recent cost reductions and growth of whole genome sequencing has produced a wealth of data in which viral presence and integration detection is possible. While key research and clinically relevant insights can be uncovered, existing software has not achieved widespread adoption, limited in part due to high computational costs, the inability to detect a wide range of viruses, as well as precision and sensitivity. Here, we describe VIRUSBreakend, a high-speed tool that identifies viral DNA presence and genomic integration recognition tool using single breakend variant calling. Single breakends are breakpoints in which only one side has been unambiguously placed. We show that by using a novel virus-centric single breakend variant calling and assembly approach, viral integrations can be identified with high sensitivity and a near-zero false discovery rate, even when integrated in regions of the host genome with low mappability, such as centromeres and telomeres that cannot be reliably called by existing tools. Applying VIRUSBreakend to a large metastatic cancer cohort, we demonstrate that it can reliably detect clinically relevant viral presence and integration including HPV, HBV, MCPyV, EBV, and HHV-8.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.