Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes

Yeo, Zhen Xuan; Wong, Joshua C. L.; Rozen, Steve; Lee, Ann Siew Gek

doi:10.1186/1471-2164-15-516

Cited by 37 publications

(44 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Even with the optimization of the bioinformatics parameters used in our pipeline, which improved the quality of mapping and variant calling, our in-house pipeline has a high false positive rate (4.3%), which is due mostly to homopolymers regions. This has been previously reported by other authors [12][13][14] and highlights the need for orthogonal confirmations. Having this scenario, we opted for the confirmation of every pathogenic or variant of unknown significance through Sanger sequencing in our clinical analysis test.…”

Section: Discussionsupporting

confidence: 85%

Development and validation of a variant detection workflow for BRCA1 and BRCA2 genes and its clinical application based on the Ion Torrent technology

et al. 2017

View full text Add to dashboard Cite

Background: Breast cancer is the most common among women worldwide, and ovarian cancer is the most difficult gynecological tumor to diagnose and with the lowest chance of cure. Mutations in BRCA1 and BRCA2 genes increase the risk of ovarian cancer by 60% and breast cancer by up to 80% in women. Molecular tests allow a better orientation for patients carrying these mutations, affecting prophylaxis, treatment, and genetic counseling. Results: Here, we evaluated the performance of a panel for BRCA1 and BRCA2, using the Ion Torrent PGM (Life Technologies) platform in a customized workflow and multiplex ligation-dependent probe amplification for detection of mutations, insertions, and deletions in these genes. We validated the panel with 26 samples previously analyzed by Myriad Genetics Laboratory, and our workflow showed 95.6% sensitivity and 100% agreement with Myriad reports, with 85% sensitivity on the positive control sample from NIST. We also screened 68 clinical samples and found 22 distinct mutations. Conclusions: The selection of a robust methodology for sample preparation and sequencing, together with bioinformatics tools optimized for the data analysis, enabled the development of a very sensitive test with high reproducibility. We also highlight the need to explore the limitations of the NGS technique and the strategies to overcome them in a clinically confident manner.

show abstract

Section: Discussionsupporting

confidence: 85%

Development and validation of a variant detection workflow for BRCA1 and BRCA2 genes and its clinical application based on the Ion Torrent technology

et al. 2017

View full text Add to dashboard Cite

show abstract

“…Generalizing these models to other sequencing technologies has proven difficult due to the need for manual retuning or extending these statistical models (see e.g. Ion Torrent 8,9 ), a major problem in an area with such rapid technological progress 12,13 , and the life sciences [14][15][16][17] . This toolchain, which we call DeepVariant, (Figure 1) begins by finding candidate SNPs and indels in reads aligned to the reference genome with high-sensitivity but low specificity.…”

Section: Main Textmentioning

confidence: 99%

mentioning

confidence: 99%

Creating a universal SNP and small indel variant caller with deep neural networks

Poplin

Chang

Alexander

et al. 2016

Preprint

View full text Add to dashboard Cite

Next-generation sequencing (NGS) is a rapidly evolving set of technologies that can be used to determine the sequence of an individual's genome 1 by calling genetic variants present in an individual using billions of short, errorful sequence reads 2 . Despite more than a decade of effort and thousands of dedicated researchers, the hand-crafted and parameterized statistical models used for variant calling still produce thousands of errors and missed variants in each genome 3,4 .Here we show that a deep convolutional neural network 5 can call genetic variation in aligned next-generation sequencing read data by learning statistical relationships (likelihoods) between images of read pileups around putative variant sites and ground-truth genotype calls. This approach, called DeepVariant, outperforms existing tools, even winning the "highest performance" award for SNPs in a FDA-administered variant calling challenge. The learned model generalizes across genome builds and even to other mammalian species, allowing non-human sequencing projects to benefit from the wealth of human ground truth data. We further show that, unlike existing tools which perform well on only a specific technology, DeepVariant can learn to call variants in a variety of sequencing technologies and experimental designs, from deep whole genomes from 10X Genomics to Ion Ampliseq exomes. DeepVariant represents a significant step from expert-driven statistical modeling towards more automatic deep learning approaches for developing software to interpret biological instrumentation data. Main TextCalling genetic variants from NGS data has proven challenging because NGS reads are not only errorful (with rates from ~0.1-10%) but result from a complex error process that depends on properties of the instrument, preceding data processing tools, and the genome sequence itself 1,3,4,6 . State-of-the-art variant callers use a variety of statistical techniques to model these error processes and thereby accurately identify differences between the reads and the reference genome caused by real genetic variants and those arising from errors in the reads 3,4,6,7 . For example, the widely-used GATK uses logistic regression to model base errors, hidden Markov models to compute read likelihoods, and naive Bayes classification to identify peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/092890 doi: bioRxiv preprint first posted online Dec. 14, 2016; Poplin et al. Creating a universal SNP and small indel variant caller with deep neural networks.variants, which are then filtered to remove likely false positives using a Gaussian mixture model with hand-crafted features capturing common error modes 6 . These techniques allow the GATK to achieve high but still imperfect accuracy on the Illumina sequencing platform 3,4 . Generalizing these models to other sequencing technologies has proven difficult due to the need for manual retuning or exte...

show abstract

“…Indels were called using a modification of the Genome Analysis ToolKit UnifiedGenotyper tool within the Ampliseq Tumor/Normal workflow that identifies candidate indels present at 10 × coverage or higher. This algorithm accounts for the homopolymer‐induced indels intrinsic to Ion Torrent sequencing data . Large indels are identified by scanning Binary Alignment/Map (BAM) files for > 10 bp of nonaligned sequences.…”

Section: Methodsmentioning

confidence: 99%

Altered molecular profile in thyroid cancers from patients affected by the Three Mile Island nuclear accident

et al. 2017

View full text Add to dashboard Cite

show abstract

Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes

Cited by 37 publications

References 26 publications

Development and validation of a variant detection workflow for BRCA1 and BRCA2 genes and its clinical application based on the Ion Torrent technology

Development and validation of a variant detection workflow for BRCA1 and BRCA2 genes and its clinical application based on the Ion Torrent technology

Creating a universal SNP and small indel variant caller with deep neural networks

Altered molecular profile in thyroid cancers from patients affected by the Three Mile Island nuclear accident

Contact Info

Product

Resources

About