Completion of eukaryal genomes can be difficult task with the highly repetitive sequences along the chromosomes and short read lengths of second-generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model organism and a cell factory, was selected for this study to demonstrate the superior capability of very long sequence reads for de novo genome assembly. We generated long reads using two common third-generation sequencing technologies (Oxford Nanopore Technology (ONT) and Pacific Biosciences (PacBio)) and used short reads obtained using Illumina sequencing for error correction. Assembly of the reads derived from all three technologies resulted in complete sequences for all 16 yeast chromosomes, as well as the mitochondrial chromosome, in one step. Further, we identified three types of DNA methylation (5mC, 4mC and 6mA). Comparison between the reference strain S288C and strain CEN.PK113-7D identified chromosomal rearrangements against a background of similar gene content between the two strains. We identified full-length transcripts through ONT direct RNA sequencing technology. This allows for the identification of transcriptional landscapes, including untranslated regions (UTRs) (5′ UTR and 3′ UTR) as well as differential gene expression quantification. About 91% of the predicted transcripts could be consistently detected across biological replicates grown either on glucose or ethanol. Direct RNA sequencing identified many polyadenylated non-coding RNAs, rRNAs, telomere-RNA, long non-coding RNA and antisense RNA. This work demonstrates a strategy to obtain complete genome sequences and transcriptional landscapes that can be applied to other eukaryal organisms.
Cancer is a malignant disease that has caused millions of human deaths. Its study has a long history of well over 100years. There have been an enormous number of publications on cancer research. This integrated but unstructured biomedical text is of great value for cancer diagnostics, treatment, and prevention. The immense body and rapid growth of biomedical text on cancer has led to the appearance of a large number of text mining techniques aimed at extracting novel knowledge from scientific text. Biomedical text mining on cancer research is computationally automatic and high-throughput in nature. However, it is error-prone due to the complexity of natural language processing. In this review, we introduce the basic concepts underlying text mining and examine some frequently used algorithms, tools, and data sets, as well as assessing how much these algorithms have been utilized. We then discuss the current state-of-the-art text mining applications in cancer research and we also provide some resources for cancer text mining. With the development of systems biology, researchers tend to understand complex biomedical systems from a systems biology viewpoint. Thus, the full utilization of text mining to facilitate cancer systems biology research is fast becoming a major concern. To address this issue, we describe the general workflow of text mining in cancer systems biology and each phase of the workflow. We hope that this review can (i) provide a useful overview of the current work of this field; (ii) help researchers to choose text mining tools and datasets; and (iii) highlight how to apply text mining to assist cancer systems biology research.
Pythium insidiosum is a human-pathogenic oomycete. Many patients infected with it lose organs or die. Toward the goal of developing improved treatment options, we want to understand how Py. insidiosum has evolved to become a successful human pathogen. Our approach here involved the use of comparative genomic and other analyses to identify genes with possible functions in the pathogenicity of Py. insidiosum. We generated an Oomycete Gene Table and used it to explore the genome contents and phylogenomic relationships of Py. insidiosum and 19 other oomycetes. Initial sequence analyses showed that Py. insidiosum is closely related to Pythium species that are not pathogenic to humans. Our analyses also indicated that the organism harbours secreted and adhesin-like proteins, which are absent from related species. Putative virulence proteins were identified by comparison to a set of known virulence genes. Among them is the urease Ure1, which is absent from humans and thus a potential diagnostic and therapeutic target. We used mass spectrometric data to successfully validate the expression of 30% of 14,962 predicted proteins and identify 15 body temperature (37 °C)-dependent proteins of Py. insidiosum. This work begins to unravel the determinants of pathogenicity of Py. insidiosum.
Pythium insidiosum is an oomycete that causes a life-threatening infectious disease called pythiosis in humans and animals living in tropical and subtropical countries. Here, we report the first draft genome sequence of P. insidiosum. The genome of P. insidiosum is 53.2 Mb and contains 14,962 open reading frames.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.