Kai Wang scite author profile

High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.

show abstract

Circular RNAs are abundant, conserved, and associated with ALU repeats

Jeck

Sorrentino

Wang

et al. 2012

RNA

3,508

115

4,277

View full text Add to dashboard Cite

Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained noncolinear exons (a "backsplice") and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression.

show abstract

Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial

Mok¹,

Wu²,

Kudaba³

et al. 2019

The Lancet

2,360

1,885

View full text Add to dashboard Cite

Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci

et al. 2010

View full text Add to dashboard Cite

We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease

show abstract

PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data

Wang¹,

Li²,

Hadley³

et al. 2007

Genome Res.

1,596

1,784

View full text Add to dashboard Cite

Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected ∼27 CNVs for each individual with a median size of ∼12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.

show abstract

The MicroRNA Spectrum in 12 Body Fluids

et al. 2010

View full text Add to dashboard Cite

BACKGROUND MicroRNAs (miRNAs) are small, noncoding RNAs that play an important role in regulating various biological processes through their interaction with cellular messenger RNAs. Extracellular miRNAs in serum, plasma, saliva, and urine have recently been shown to be associated with various pathological conditions including cancer. METHODS With the goal of assessing the distribution of miRNAs and demonstrating the potential use of miRNAs as biomarkers, we examined the presence of miRNAs in 12 human body fluids and urine samples from women in different stages of pregnancy or patients with different urothelial cancers. Using quantitative PCR, we conducted a global survey of the miRNA distribution in these fluids. RESULTS miRNAs were present in all fluids tested and showed distinct compositions in different fluid types. Several of the highly abundant miRNAs in these fluids were common among multiple fluid types, and some of the miRNAs were enriched in specific fluids. We also observed distinct miRNA patterns in the urine samples obtained from individuals with different physiopathological conditions. CONCLUSIONS MicroRNAs are ubiquitous in all the body fluid types tested. Fluid type–specific miRNAs may have functional roles associated with the surrounding tissues. In addition, the changes in miRNA spectra observed in the urine samples from patients with different urothelial conditions demonstrates the potential for using concentrations of specific miRNAs in body fluids as biomarkers for detecting and monitoring various physiopathological conditions.

show abstract

Functional impact of global rare copy number variation in autism spectrum disorders

Pinto

Pagnamenta

Klei

et al. 2010

Nature

1,785

1,707

View full text Add to dashboard Cite

Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing

Frampton¹,

Fichtenholtz²,

Otto³

et al. 2013

Nat Biotechnol

1,792

1,697

View full text Add to dashboard Cite

As more clinically relevant cancer genes are identified, comprehensive diagnostic approaches are needed to match patients to therapies, raising the challenge of optimization and analytical validation of assays that interrogate millions of bases of cancer genomes altered by multiple mechanisms. Here we describe a test based on massively parallel DNA sequencing to characterize base substitutions, short insertions and deletions (indels), copy number alterations and selected fusions across 287 cancer-related genes from routine formalin-fixed and paraffin-embedded (FFPE) clinical specimens. We implemented a practical validation strategy with reference samples of pooled cell lines that model key determinants of accuracy, including mutant allele frequency, indel length and amplitude of copy change. Test sensitivity achieved was 95–99% across alteration types, with high specificity (positive predictive value >99%). We confirmed accuracy using 249 FFPE cancer specimens characterized by established assays. Application of the test to 2,221 clinical cases revealed clinically actionable alterations in 76% of tumors, three times the number of actionable alterations detected by current diagnostic tests.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

334 Leonard St

Brooklyn, NY 11211

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai Wang

ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data

Circular RNAs are abundant, conserved, and associated with ALU repeats

Pembrolizumab versus chemotherapy for previously untreated, PD-L1-expressing, locally advanced or metastatic non-small-cell lung cancer (KEYNOTE-042): a randomised, open-label, controlled, phase 3 trial

Genome-wide meta-analysis increases to 71 the number of confirmed Crohn's disease susceptibility loci

PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data

The MicroRNA Spectrum in 12 Body Fluids

Functional impact of global rare copy number variation in autism spectrum disorders

Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing

Contact Info

Product

Resources

About