Long non-coding RNAs (lncRNAs) are emerging as important regulators of tissue physiology and disease processes including cancer. In order to delineate genome-wide lncRNA expression, we curated 7,256 RNA-Seq libraries from tumors, normal tissues, and cell lines comprising over 43 terabases of sequence from 25 independent studies. We applied ab initio assembly methodology to this dataset, yielding a consensus human transcriptome of 91,013 expressed genes. Over 68% (58,648) of genes were classified as lncRNAs, of which 79% (48,952) were previously unannotated. About 1% (597) of the lncRNAs harbored ultraconserved elements and 7% (3,900) overlapped disease-associated single nucleotide polymorphisms (SNPs). To prioritize lineage-specific, disease-associated lncRNA expression we employed non-parametric differential expression testing and nominated 7,942 lineage- or cancer-associated lncRNA genes. The lncRNA landscape characterized here may shed light into normal biology and cancer pathogenesis, and be valuable for future biomarker development.
Accurate transcript structure and abundance inference from RNA-Seq data is foundational for molecular discovery. Here we present TACO, a computational method to reconstruct a consensus transcriptome from multiple RNA-Seq datasets. TACO employs novel change-point detection to demarcate transcript start and end sites, leading to dramatically improved reconstruction accuracy compared to other tools in its class. The tool is available at http://tacorna.github.io and can be readily incorporated into RNA-Seq analysis workflows.
Background: The human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.
Introduction: Over the past few years, the demographic profile of lung cancer has changed. However, most reports are limited by small numbers, short follow-up period, and show an inconsistent pattern. A comprehensive evaluation of changing trends over a long period has not been done. Materials and Methods: Consecutive lung cancer patients were studied over a 10-year period from January 2008 to March 2018 at the All India Institute of Medical Sciences, New Delhi, and relevant clinical information, and survival outcomes were analyzed. Results: A total of 1862 patients were evaluated, with mean (SD) age of 59 (11.1) years, and comprising 82.9% males. Majority were smokers (76.2%) with median smoking index of 500 (interquartile range [IQR]: 300–800). Adenocarcinoma (ADC) was the most common type (34%), followed by squamous cell carcinoma (SCC – 28.6%) and small cell lung cancer (SCLC) (16.1%). Over the 10-year period, ADC increased from 9.5% to 35.9%, SCC from 25.4% to 30.6%, and non-small cell lung cancer -not otherwise specified (NSCLC-NOS) decreased from 49.2% to 21.4%. The proportion of females with lung cancer increased although smoking rates remained similar. Majority of NSCLC (95%) continued to be diagnosed at an advanced stage (3 or 4). Epidermal growth factor receptor (EGFR) mutations and anaplastic lymphoma kinase (ALK) rearrangements were present in 25.3% and 11.5% ADC patients, respectively. The median overall survival was 8.8 months (IQR 3.7–19) for all patients and 12.57 (IQR 6.2–28.7) months among the 1013 patients who were initiated on specific treatment (chemotherapy, targeted therapy, radiotherapy, or surgery). Never-smokers were younger, more likely to be female and educated, had a higher prevalence of ADC and EGFR/ALK mutations, and had better survival. Conclusion: Among this large cohort, our center seems to follow the global trend with increasing incidence of ADC. EGFR mutation positivity was similar to existing reports, while higher ALK positivity was detected. A characteristic phenotype of never-smokers with lung cancer was elucidated which demonstrated better survival.
BackgroundThe human genome contains variants ranging in size from small single nucleotide polymorphisms (SNPs) to large structural variants (SVs). High-quality benchmark small variant calls for the pilot National Institute of Standards and Technology (NIST) Reference Material (NA12878) have been developed by the Genome in a Bottle Consortium, but no similar high-quality benchmark SV calls exist for this genome. Since SV callers output highly discordant results, we developed methods to combine multiple forms of evidence from multiple sequencing technologies to classify candidate SVs into likely true or false positives. Our method (svclassify) calculates annotations from one or more aligned bam files from many high-throughput sequencing technologies, and then builds a one-class model using these annotations to classify candidate SVs as likely true or false positives.ResultsWe first used pedigree analysis to develop a set of high-confidence breakpoint-resolved large deletions. We then used svclassify to cluster and classify these deletions as well as a set of high-confidence deletions from the 1000 Genomes Project and a set of breakpoint-resolved complex insertions from Spiral Genetics. We find that likely SVs cluster separately from likely non-SVs based on our annotations, and that the SVs cluster into different types of deletions. We then developed a supervised one-class classification method that uses a training set of random non-SV regions to determine whether candidate SVs have abnormal annotations different from most of the genome. To test this classification method, we use our pedigree-based breakpoint-resolved SVs, SVs validated by the 1000 Genomes Project, and assembly-based breakpoint-resolved insertions, along with semi-automated visualization using svviz.ConclusionsWe find that candidate SVs with high scores from multiple technologies have high concordance with PCR validation and an orthogonal consensus method MetaSV (99.7 % concordant), and candidate SVs with low scores are questionable. We distribute a set of 2676 high-confidence deletions and 68 high-confidence insertions with high svclassify scores from these call sets for benchmarking SV callers. We expect these methods to be particularly useful for establishing high-confidence SV calls for benchmark samples that have been characterized by multiple technologies.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-016-2366-2) contains supplementary material, which is available to authorized users.
Flexible bronchoscopy (FB) is commonly performed by respiratory physicians for diagnostic as well as therapeutic purposes. However, bronchoscopy practices vary widely across India and worldwide. The three major respiratory organizations of the country supported a national-level expert group that formulated a comprehensive guideline document for FB based on a detailed appraisal of available evidence. These guidelines are an attempt to provide the bronchoscopist with the most scientifically sound as well as practical approach of bronchoscopy. It involved framing appropriate questions, review and critical appraisal of the relevant literature and reaching a recommendation by the expert groups. The guidelines cover major areas in basic bronchoscopy including (but not limited to), indications for procedure, patient preparation, various sampling procedures, bronchoscopy in the ICU setting, equipment care, and training issues. The target audience is respiratory physicians working in India and well as other parts of the world. It is hoped that this document would serve as a complete reference guide for all pulmonary physicians performing or desiring to learn the technique of flexible bronchoscopy.
This paper presents an approach for making inference on the parameters µ and σ of a Gaussian distribution in the presence of resolution errors. The approach is based on the principle of fiducial inference and requires a Monte Carlo method for computing uncertainty intervals. A small simulation study is carried out to evaluate the performance of the proposed procedure and compare it with some of the existing procedures. The results indicate that the fiducial procedure is comparable to the best of the competing procedures for inference on µ. However, unlike some of the competing procedures, the same Monte Carlo calculations also provide inference for σ and many other related quantities of interest.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.