High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a ‘variants reduction’ protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.
Circular RNAs composed of exonic sequence have been described in a small number of genes. Thought to result from splicing errors, circular RNA species possess no known function. To delineate the universe of endogenous circular RNAs, we performed high-throughput sequencing (RNA-seq) of libraries prepared from ribosome-depleted RNA with or without digestion with the RNA exonuclease, RNase R. We identified >25,000 distinct RNA species in human fibroblasts that contained noncolinear exons (a "backsplice") and were reproducibly enriched by exonuclease degradation of linear RNA. These RNAs were validated as circular RNA (ecircRNA), rather than linear RNA, and were more stable than associated linear mRNAs in vivo. In some cases, the abundance of circular molecules exceeded that of associated linear mRNA by >10-fold. By conservative estimate, we identified ecircRNAs from 14.4% of actively transcribed genes in human fibroblasts. Application of this method to murine testis RNA identified 69 ecircRNAs in precisely orthologous locations to human circular RNAs. Of note, paralogous kinases HIPK2 and HIPK3 produce abundant ecircRNA from their second exon in both humans and mice. Though HIPK3 circular RNAs contain an AUG translation start, it and other ecircRNAs were not bound to ribosomes. Circular RNAs could be degraded by siRNAs and, therefore, may act as competing endogenous RNAs. Bioinformatic analysis revealed shared features of circularized exons, including long bordering introns that contained complementary ALU repeats. These data show that ecircRNAs are abundant, stable, conserved and nonrandom products of RNA splicing that could be involved in control of gene expression.
We undertook a meta-analysis of six Crohn's disease genome-wide association studies (GWAS) comprising 6,333 affected individuals (cases) and 15,056 controls and followed up the top association signals in 15,694 cases, 14,026 controls and 414 parent-offspring trios. We identified 30 new susceptibility loci meeting genome-wide significance (P < 5 × 10⁻⁸). A series of in silico analyses highlighted particular genes within these loci and, together with manual curation, implicated functionally interesting candidate genes including SMAD3, ERAP2, IL10, IL2RA, TYK2, FUT2, DNMT3A, DENND1B, BACH2 and TAGAP. Combined with previously confirmed loci, these results identify 71 distinct loci with genome-wide significant evidence for association with Crohn's disease
Comprehensive identification and cataloging of copy number variations (CNVs) is required to provide a complete view of human genetic variation. The resolution of CNV detection in previous experimental designs has been limited to tens or hundreds of kilobases. Here we present PennCNV, a hidden Markov model (HMM) based approach, for kilobase-resolution detection of CNVs from Illumina high-density SNP genotyping data. This algorithm incorporates multiple sources of information, including total signal intensity and allelic intensity ratio at each SNP marker, the distance between neighboring SNPs, the allele frequency of SNPs, and the pedigree information where available. We applied PennCNV to genotyping data generated for 112 HapMap individuals; on average, we detected ∼27 CNVs for each individual with a median size of ∼12 kb. Excluding common rearrangements in lymphoblastoid cell lines, the fraction of CNVs in offspring not detected in parents (CNV-NDPs) was 3.3%. Our results demonstrate the feasibility of whole-genome fine-mapping of CNVs via high-density SNP genotyping.
BACKGROUND
MicroRNAs (miRNAs) are small, noncoding RNAs that play an important role in regulating various biological processes through their interaction with cellular messenger RNAs. Extracellular miRNAs in serum, plasma, saliva, and urine have recently been shown to be associated with various pathological conditions including cancer.
METHODS
With the goal of assessing the distribution of miRNAs and demonstrating the potential use of miRNAs as biomarkers, we examined the presence of miRNAs in 12 human body fluids and urine samples from women in different stages of pregnancy or patients with different urothelial cancers. Using quantitative PCR, we conducted a global survey of the miRNA distribution in these fluids.
RESULTS
miRNAs were present in all fluids tested and showed distinct compositions in different fluid types. Several of the highly abundant miRNAs in these fluids were common among multiple fluid types, and some of the miRNAs were enriched in specific fluids. We also observed distinct miRNA patterns in the urine samples obtained from individuals with different physiopathological conditions.
CONCLUSIONS
MicroRNAs are ubiquitous in all the body fluid types tested. Fluid type–specific miRNAs may have functional roles associated with the surrounding tissues. In addition, the changes in miRNA spectra observed in the urine samples from patients with different urothelial conditions demonstrates the potential for using concentrations of specific miRNAs in body fluids as biomarkers for detecting and monitoring various physiopathological conditions.
As more clinically relevant cancer genes are identified, comprehensive
diagnostic approaches are needed to match patients to therapies, raising the
challenge of optimization and analytical validation of assays that interrogate
millions of bases of cancer genomes altered by multiple mechanisms. Here we
describe a test based on massively parallel DNA sequencing to characterize base
substitutions, short insertions and deletions (indels), copy number alterations
and selected fusions across 287 cancer-related genes from routine formalin-fixed
and paraffin-embedded (FFPE) clinical specimens. We implemented a practical
validation strategy with reference samples of pooled cell lines that model key
determinants of accuracy, including mutant allele frequency, indel length and
amplitude of copy change. Test sensitivity achieved was 95–99%
across alteration types, with high specificity (positive predictive value
>99%). We confirmed accuracy using 249 FFPE cancer specimens
characterized by established assays. Application of the test to 2,221 clinical
cases revealed clinically actionable alterations in 76% of tumors, three
times the number of actionable alterations detected by current diagnostic
tests.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.