MotivationReprogramming somatic cells into neurons holds great promise to model neuronal development and disease. The efficiency and success rate of neuronal reprogramming, however, may vary between different conversion platforms and cell types, thereby necessitating an unbiased, systematic approach to estimate neuronal identity of converted cells. Recent studies have demonstrated that long genes (>100 kb from transcription start to end) are highly enriched in neurons, which provides an opportunity to identify neurons based on the expression of these long genes.ResultsWe have developed a versatile R package, LONGO, to analyze gene expression based on gene length. We propose a systematic analysis of long gene expression (LGE) with a metric termed the long gene quotient (LQ) that quantifies LGE in RNA-seq or microarray data to validate neuronal identity at the single-cell and population levels. This unique feature of neurons provides an opportunity to utilize measurements of LGE in transcriptome data to quickly and easily distinguish neurons from non-neuronal cells. By combining this conceptual advancement and statistical tool in a user-friendly and interactive software package, we intend to encourage and simplify further investigation into LGE, particularly as it applies to validating and improving neuronal differentiation and reprogramming methodologies.Availability and implementationLONGO is freely available for download at https://github.com/biohpc/longo.Supplementary information Supplementary data are available at Bioinformatics online.
Purpose Genomic sequencing has become an increasingly powerful and relevant tool to be leveraged for the discovery of genetic aberrations underlying rare, Mendelian conditions. Although the computational tools incorporated into diagnostic workflows for this task are continually evolving and improving, we nevertheless sought to investigate commonalities across sequencing processing workflows to reveal consensus and standard practice tools and highlight exploratory analyses where technical and theoretical method improvements would be most impactful. Methods We collected details regarding the computational approaches used by a genetic testing laboratory and 11 clinical research sites in the United States participating in the Undiagnosed Diseases Network via meetings with bioinformaticians, online survey forms, and analyses of internal protocols. Results We found that tools for processing genomic sequencing data can be grouped into four distinct categories. Whereas well-established practices exist for initial variant calling and quality control steps, there is substantial divergence across sites in later stages for variant prioritization and multimodal data integration, demonstrating a diversity of approaches for solving the most mysterious undiagnosed cases. Conclusion The largest differences across diagnostic workflows suggest that advances in structural variant detection, noncoding variant interpretation, and integration of additional biomedical data may be especially promising for solving chronically undiagnosed cases.
Chromodomain helicase DNA-binding protein 7 (CHD7) pathogenic variants are identified in more than 90% of infants and children with CHARGE (Coloboma of the iris, retina, and/or optic disk; congenital Heart defects, choanal Atresia, Retardation of growth and development, Genital hypoplasia, and characteristic outer and inner Ear anomalies and deafness) syndrome. Approximately, 10% of cases have no known genetic cause identified. We report a male child with clinical features of CHARGE syndrome and nondiagnostic genetic testing that included chromosomal microarray, CHD7 sequencing and deletion/duplication analysis, SEMA3E sequencing, and trio exome and whole-genome sequencing (WGS). We used a comprehensive clinical assessment, genome-wide methylation analysis (GMA), reanalysis of WGS data, and CHD7 RNA studies to discover a novel variant that causes CHD7 haploinsufficiency.The 7-year-old Hispanic male proband has typical phenotypic features of CHARGE syndrome. GMA revealed a CHD7-associated epigenetic signature. Reanalysis of the WGS data with focused bioinformatic analysis of CHD7 detected a novel, de novo 15 base pair deletion in Intron 4 of CHD7 (c.2239). Using proband RNA, we confirmed that this novel deletion causes CHD7 haploinsufficiency by disrupting the canonical 3 0 splice site and introducing a premature stop codon. Integrated genomic, epigenomic, and transcriptome analyses discovered a novel CHD7 variant that causes CHARGE syndrome.
BackgroundDe novo genome assembly is a technique that builds the genome of a specimen using overlaps of genomic fragments without additional work with reference sequence. Sequence fragments (called reads) are assembled as contigs and scaffolds by the overlaps. The quality of the de novo assembly depends on the length and continuity of the assembly. To enable faster and more accurate assembly of species, existing sequencing techniques have been proposed, for example, high-throughput next-generation sequencing and long-reads-producing third-generation sequencing. However, these techniques require a large amounts of computer memory when very huge-size overlap graphs are resolved. Also, it is challenging for parallel computation.ResultsTo address the limitations, we propose an innovative algorithmic approach, called Scalable Overlap-graph Reduction Algorithms (SORA). SORA is an algorithm package that performs string graph reduction algorithms by Apache Spark. The SORA’s implementations are designed to execute de novo genome assembly on either a single machine or a distributed computing platform. SORA efficiently compacts the number of edges on enormous graphing paths by adapting scalable features of graph processing libraries provided by Apache Spark, GraphX and GraphFrames.ConclusionsWe shared the algorithms and the experimental results at our project website, https://github.com/BioHPC/SORA. We evaluated SORA with the human genome samples. First, it processed a nearly one billion edge graph on a distributed cloud cluster. Second, it processed mid-to-small size graphs on a single workstation within a short time frame. Overall, SORA achieved the linear-scaling simulations for the increased computing instances.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.