BackgroundTo facilitate the clinical implementation of genomic medicine by next-generation sequencing, it will be critically important to obtain accurate and consistent variant calls on personal genomes. Multiple software tools for variant calling are available, but it is unclear how comparable these tools are or what their relative merits in real-world scenarios might be.MethodsWe sequenced 15 exomes from four families using commercial kits (Illumina HiSeq 2000 platform and Agilent SureSelect version 2 capture kit), with approximately 120X mean coverage. We analyzed the raw data using near-default parameters with five different alignment and variant-calling pipelines (SOAP, BWA-GATK, BWA-SNVer, GNUMAP, and BWA-SAMtools). We additionally sequenced a single whole genome using the sequencing and analysis pipeline from Complete Genomics (CG), with 95% of the exome region being covered by 20 or more reads per base. Finally, we validated 919 single-nucleotide variations (SNVs) and 841 insertions and deletions (indels), including similar fractions of GATK-only, SOAP-only, and shared calls, on the MiSeq platform by amplicon sequencing with approximately 5000X mean coverage.ResultsSNV concordance between five Illumina pipelines across all 15 exomes was 57.4%, while 0.5 to 5.1% of variants were called as unique to each pipeline. Indel concordance was only 26.8% between three indel-calling pipelines, even after left-normalizing and intervalizing genomic coordinates by 20 base pairs. There were 11% of CG variants falling within targeted regions in exome sequencing that were not called by any of the Illumina-based exome analysis pipelines. Based on targeted amplicon sequencing on the MiSeq platform, 97.1%, 60.2%, and 99.1% of the GATK-only, SOAP-only and shared SNVs could be validated, but only 54.0%, 44.6%, and 78.1% of the GATK-only, SOAP-only and shared indels could be validated. Additionally, our analysis of two families (one with four individuals and the other with seven), demonstrated additional accuracy gained in variant discovery by having access to genetic data from a multi-generational family.ConclusionsOur results suggest that more caution should be exercised in genomic medicine settings when analyzing individual genomes, including interpreting positive and negative findings with scrutiny, especially for indels. We advocate for renewed collection and sequencing of multi-generational families to increase the overall accuracy of whole genomes.
Summary Neuroblastoma is a pediatric malignancy that typically arises in early childhood and is derived from the developing sympathetic nervous system. Clinical phenotypes range from localized tumors with excellent outcomes to widely metastatic disease where long-term survival is approximately 40% despite intensive therapy1. A previous genome-wide association study (GWAS) identified common polymorphisms at the LMO1 gene locus that are highly associated with neuroblastoma susceptibility and oncogenic addiction to LMO1 in the tumor cells2. Here we sought to discover the causal DNA variant at this locus and the mechanism by which it leads to neuroblastoma tumorigenesis. We first imputed all possible genotypes across the LMO1 locus and then mapped highly associated single nucleotide polymorphism (SNPs) to areas of chromatin accessibility, evolutionary conservation, and transcription factor binding sites. SNP rs2168101 G>T was the most highly associated variant (combined P=7.47×10-29, Odds Ratio 0.65, 95% CI: 0.60-0.70) and resided in a super-enhancer defined by extensive acetylation of histone H3 lysine 27 within the first intron of LMO1. The ancestral G-allele that is associated with tumor formation resides in a conserved GATA transcription factor binding motif. We show that the newly evolved protective TATA allele is associated with decreased total LMO1 expression (P=0.028) in neuroblastoma primary tumors and ablates GATA3 binding (P<0.0001). We demonstrate allelic imbalance favoring the G-containing strand in tumors heterozygous for this SNP as demonstrated both by RNA sequencing (P<0.0001) and reporter assays (P=0.002). These findings show that a recently evolved polymorphism within a super-enhancer element in the first intron of LMO1 influences neuroblastoma susceptibility through differential GATA transcription factor binding and direct modulation of LMO1 expression in cis, and this leads to an oncogenic dependency in tumor cells.
Infantile myofibromatosis (IM) is a disorder of mesenchymal proliferation characterized by the development of nonmetastasizing tumors in the skin, muscle, bone, and viscera. Occurrence within families across multiple generations is suggestive of an autosomal-dominant (AD) inheritance pattern, but autosomal-recessive (AR) modes of inheritance have also been proposed. We performed whole-exome sequencing (WES) in members of nine unrelated families clinically diagnosed with AD IM to identify the genetic origin of the disorder. In eight of the families, we identified one of two disease-causing mutations, c.1978C>A (p.Pro660Thr) and c.1681C>T (p.Arg561Cys), in PDGFRB. Intriguingly, one family did not have either of these PDGFRB mutations but all affected individuals had a c.4556T>C (p.Leu1519Pro) mutation in NOTCH3. Our studies suggest that mutations in PDGFRB are a cause of IM and highlight NOTCH3 as a candidate gene. Further studies of the crosstalk between PDGFRB and NOTCH pathways may offer new opportunities to identify mutations in other genes that result in IM and is a necessary first step toward understanding the mechanisms of both tumor growth and regression and its targeted treatment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.