2015
DOI: 10.1186/s12864-015-1333-7
|View full text |Cite
|
Sign up to set email alerts
|

The distribution and mutagenesis of short coding INDELs from 1,128 whole exomes

Abstract: BackgroundIdentifying insertion/deletion polymorphisms (INDELs) with high confidence has been intrinsically challenging in short-read sequencing data. Here we report our approach for improving INDEL calling accuracy by using a machine learning algorithm to combine call sets generated with three independent methods, and by leveraging the strengths of each individual pipeline. Utilizing this approach, we generated a consensus exome INDEL call set from a large dataset generated by the 1000 Genomes Project (1000G)… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0
3

Year Published

2016
2016
2023
2023

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(9 citation statements)
references
References 26 publications
0
6
0
3
Order By: Relevance
“…As different genes are susceptible to undergo different kinds of mutations at varied frequency, we aimed to identify mutation types that RBPs enriched for mutations (GEM-RBPs) frequently undergo when compared to RBPs that are not enriched for mutations (NonGEM-RBPs) (See Materials and Methods). In particular, we quantified the mutation frequencies of nine different classes of mutations namely Frameshift mutations – Deletion and Insertion, 31 Inframe Deletion, Inframe Insertion, 32 Missense, 33 Nonsense, 34 Nonstop, 35 Silent 36 and Splice Site 37 for all the RBPs across cancer samples (Materials and Methods, Table S2). Our analysis clearly revealed that RBPs frequently and significantly undergo Frameshift deletion, Inframe deletion, Missense, Nonsense and Silent mutations ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…As different genes are susceptible to undergo different kinds of mutations at varied frequency, we aimed to identify mutation types that RBPs enriched for mutations (GEM-RBPs) frequently undergo when compared to RBPs that are not enriched for mutations (NonGEM-RBPs) (See Materials and Methods). In particular, we quantified the mutation frequencies of nine different classes of mutations namely Frameshift mutations – Deletion and Insertion, 31 Inframe Deletion, Inframe Insertion, 32 Missense, 33 Nonsense, 34 Nonstop, 35 Silent 36 and Splice Site 37 for all the RBPs across cancer samples (Materials and Methods, Table S2). Our analysis clearly revealed that RBPs frequently and significantly undergo Frameshift deletion, Inframe deletion, Missense, Nonsense and Silent mutations ( Fig.…”
Section: Resultsmentioning
confidence: 99%
“…Many attempts have been made to address this issue by calling indels from exome sequencing data with additional methods [ 24 28 ], and such efforts have recently been implemented in large scale exome research studies [ 11 ]. While some programs examine read depth against a reference population, Pindel [ 29 ] extracts unmapped reads from BAM files and analyzes soft clipped bases of read pairs for evidence of medium-sized structural variation.…”
Section: Resultsmentioning
confidence: 99%
“…Overall, 98.5% of all TENM4 coding base pairs were covered by >50 reads across samples. The pipeline comprised the following steps: (1) preprocessing in which MIP IDs were assigned to each raw sequencer read based on the arms sequence and then trimmomatic was used for the base quality trimming of raw reads; (2) mapping to the human reference genome (hg19) using the Burrows‐Wheeler Aligner; (3) postprocessing of the Burrows‐Wheeler Aligner using 1000 Genomes Phase I Indel calls; (4) calling using GATK (genome analysis toolkit from Broad Institute) Unified Genotyper as the variant caller; (5) filtering with VCFtools based on genotype quality; and (6) using ANNOVAR (annotation variant software) gene‐based (eg, protein‐coding change) and filter‐based annotations (eg, variants reported in public databases of healthy individuals such as the Exome Aggregation Consortium [ExAC] were used to unravel candidate functionally relevant variants.…”
Section: Methodsmentioning
confidence: 99%