Tsetse flies are the sole vectors of human African trypanosomiasis throughout sub-Saharan Africa. Both sexes of adult tsetse feed exclusively on blood and contribute to disease transmission. Notable differences between tsetse and other disease vectors include obligate microbial symbioses, viviparous reproduction, and lactation. Here, we describe the sequence and annotation of the 366-megabase Glossina morsitans morsitans genome. Analysis of the genome and the 12,308 predicted protein–encoding genes led to multiple discoveries, including chromosomal integrations of bacterial (Wolbachia) genome sequences, a family of lactation-specific proteins, reduced complement of host pathogen recognition proteins, and reduced olfaction/chemosensory associated genes. These genome data provide a foundation for research into trypanosomiasis prevention and yield important insights with broad implications for multiple aspects of tsetse biology.
BackgroundThird-generation sequencing technologies have advanced the progress of the biological research by generating reads that are substantially longer than second-generation sequencing technologies. However, their notorious high error rate impedes straightforward data analysis and limits their application. A handful of error correction methods for these error-prone long reads have been developed to date. The output data quality is very important for downstream analysis, whereas computing resources could limit the utility of some computing-intense tools. There is a lack of standardized assessments for these long-read error-correction methods.ResultsHere, we present a comparative performance assessment of ten state-of-the-art error-correction methods for long reads. We established a common set of benchmarks for performance assessment, including sensitivity, accuracy, output rate, alignment rate, output read length, run time, and memory usage, as well as the effects of error correction on two downstream applications of long reads: de novo assembly and resolving haplotype sequences.ConclusionsTaking into account all of these metrics, we provide a suggestive guideline for method choice based on available data size, computing resources, and individual research goals.Electronic supplementary materialThe online version of this article (10.1186/s13059-018-1605-z) contains supplementary material, which is available to authorized users.
Background Transposable elements (TEs) are a significant component of eukaryotic genomes and play essential roles in genome evolution. Mounting evidence indicates that TEs are highly transcribed in early embryo development and contribute to distinct biological functions and tissue morphology. Results We examine the epigenetic dynamics of mouse TEs during the development of five tissues: intestine, liver, lung, stomach, and kidney. We found that TEs are associated with over 20% of open chromatin regions during development. Close to half of these accessible TEs are only activated in a single tissue and a specific developmental stage. Most accessible TEs are rodent-specific. Across these five tissues, 453 accessible TEs are found to create the transcription start sites of downstream genes in mouse, including 117 protein-coding genes and 144 lincRNA genes, 93.7% of which are mouse-specific. Species-specific TE-derived transcription start sites are found to drive the expression of tissue-specific genes and change their tissue-specific expression patterns during evolution. Conclusion Our results suggest that TE insertions increase the regulatory potential of the genome, and some TEs have been domesticated to become a crucial component of gene and regulate tissue-specific expression during mouse tissue development.
MotivationIn the past years, the long read (LR) sequencing technologies, such as Pacific Biosciences and Oxford Nanopore Technologies, have been demonstrated to substantially improve the quality of genome assembly and transcriptome characterization. Compared to the high cost of genome assembly by LR sequencing, it is more affordable to generate LRs for transcriptome characterization. That is, when informative transcriptome LR data are available without a high-quality genome, a method for de novo transcriptome assembly and annotation is of high demand.ResultsWithout a reference genome, IDP-denovo performs de novo transcriptome assembly, isoform annotation and quantification by integrating the strengths of LRs and short reads. Using the GM12878 human data as a gold standard, we demonstrated that IDP-denovo had superior sensitivity of transcript assembly and high accuracy of isoform annotation. In addition, IDP-denovo outputs two abundance indices to provide a comprehensive expression profile of genes/isoforms. IDP-denovo represents a robust approach for transcriptome assembly, isoform annotation and quantification for non-model organism studies. Applying IDP-denovo to a non-model organism, Dendrobium officinale, we discovered a number of novel genes and novel isoforms that were not reported by the existing annotation library. These results reveal the high diversity of gene isoforms in D.officinale, which was not reported in the existing annotation library.Availability and implementationThe dataset of Dendrobium officinale used/analyzed during the current study has been deposited in SRA, with accession code SRP094520. IDP-denovo is available for download at www.healthcare.uiowa.edu/labs/au/IDP-denovo/.Supplementary information Supplementary data are available at Bioinformatics online.
Using nuclear factor-κB (NF-κB) ChIP-Seq data, we present a framework for iterative learning of regulatory networks. For every possible transcription factor-binding site (TFBS)-putatively regulated gene pair, the relative distance and orientation are calculated to learn which TFBSs are most likely to regulate a given gene. Weighted TFBS contributions to putative gene regulation are integrated to derive an NF-κB gene network. A de novo motif enrichment analysis uncovers secondary TFBSs (AP1, SP1) at characteristic distances from NF-κB/RelA TFBSs. Comparison with experimental ENCODE ChIP-Seq data indicates that experimental TFBSs highly correlate with predicted sites. We observe that RelA-SP1-enriched promoters have distinct expression profiles from that of RelA-AP1 and are enriched in introns, CpG islands and DNase accessible sites. Sixteen novel NF-κB/RelA-regulated genes and TFBSs were experimentally validated, including TANK, a negative feedback gene whose expression is NF-κB/RelA dependent and requires a functional interaction with the AP1 TFBSs. Our probabilistic method yields more accurate NF-κB/RelA-regulated networks than a traditional, distance-based approach, confirmed by both analysis of gene expression and increased informativity of Genome Ontology annotations. Our analysis provides new insights into how co-occurring TFBSs and local chromatin context orchestrate activation of NF-κB/RelA sub-pathways differing in biological function and temporal expression patterns.
ATAC-seq is widely used to measure chromatin accessibility and identify open chromatin regions (OCRs). OCRs usually indicate active regulatory elements in the genome and are directly associated with the gene regulatory network. The identification of differential accessibility regions (DARs) between different biological conditions is critical in determining the differential activity of regulatory elements. Differential analysis of ATAC-seq shares many similarities with differential expression analysis of RNAseq data. However, the distribution of ATAC-seq signal intensity is different from that of RNA-seq data, and higher sensitivity is required for DARs identification. Many different tools can be used to perform differential analysis of ATAC-seq data, but a comprehensive comparison and benchmarking of these methods is still lacking. Here, we used simulated datasets to systematically measure the sensitivity and specificity of six different methods. We further discussed the statistical and signal density cutoffs in the differential analysis of ATAC-seq by applying them to real data. Batch effects are very common in high-throughput sequencing experiments. We illustrated that batch-effect correction can dramatically improve sensitivity in the differential analysis of ATAC-seq data. Finally, we developed a user-friendly package, BeCorrect, to perform batch effect correction and visualization of corrected ATAC-seq signals in a genome browser. Gene regulation in the mammalian genome involves different types of regulatory elements, such as promoters, enhancers, and insulators. It was estimated that there are over two million regulatory elements in the human and mouse genomes 1,2 , and these regulatory elements recruit different epigenetic modifications to regulate the expression of genes in cell type-specific and developmental stage-specific manners 3-5. Active regulatory elements must remain in an accessible state to allow the binding of different transcription factors to activate or silence target genes. ATAC-seq (assay for transposase-accessible chromatin followed by sequencing) is a recently developed technique to measure genome-wide chromatin accessibility (or open chromatin) 6,7. Compared with other techniques, such as DNase-seq, Mnase-seq, and FAIRE-seq, ATAC-seq experiments are relatively easier to perform across different tissues and cell types. Furthermore, ATAC-seq experiments allow ultra-low input cell numbers, even down to the single-cell level 8. These advantages propelled ATAC-seq to be the most widely used technology to define open chromatin by many large genomics consortiums, including ENCODE 9 , TCGA 10 , PsychENCODE 11 , IHEC 12 , and TaRGET II 13. The peak-calling analysis used to identify open chromatin regions (OCRs) by using ATAC-seq is generally adapted from ChIP-seq data analysis. However, there are fundamental differences between ATAC-seq and ChIP-seq-most notably that ATAC-seq is performed without control or input samples. Nonetheless, peak callers, such as macs2 14 , can identify OCRs by evalua...
The recent derivation of human trophoblast stem cells (hTSCs) provides a scalable in vitro model system of human placental development, but the molecular regulators of hTSC identity have not been systematically explored thus far. Here, we utilize a genome-wide CRISPR-Cas9 knockout screen to comprehensively identify essential and growth-restricting genes in hTSCs. By cross-referencing our data to those from similar genetic screens performed in other cell types, as well as gene expression data from early human embryos, we define hTSC-specific and -enriched regulators. These include both well-established and previously uncharacterized trophoblast regulators, such as ARID3A, GATA2, and TEAD1 (essential), and GCM1, PTPN14, and TET2 (growth-restricting). Integrated analysis of chromatin accessibility, gene expression, and genome-wide location data reveals that the transcription factor TEAD1 regulates the expression of many trophoblast regulators in hTSCs. In the absence of TEAD1, hTSCs fail to complete faithful differentiation into extravillous trophoblast (EVT) cells and instead show a bias towards syncytiotrophoblast (STB) differentiation, thus indicating that this transcription factor safeguards the bipotent lineage potential of hTSCs. Overall, our study provides a valuable resource for dissecting the molecular regulation of human placental development and diseases.
Dendrobium officinale is an extremely valuable orchid used in traditional Chinese medicine, so sought after that it has a higher market value than gold. Although the expression profiles of some genes involved in the polysaccharide synthesis have previously been investigated, little research has been carried out on their alternatively spliced isoforms in D. officinale. In addition, information regarding the translocation of sugars from leaves to stems in D. officinale also remains limited. We analyzed the polysaccharide content of D. officinale leaves and stems, and completed in-depth transcriptome sequencing of these two diverse tissue types using second-generation sequencing (SGS) and single-molecule real-time (SMRT) sequencing technology. The results of this study yielded a digital inventory of gene and mRNA isoform expressions. A comparative analysis of both transcriptomes uncovered a total of 1414 differentially expressed genes, including 844 that were up-regulated and 570 that were down-regulated in stems. Of these genes, one sugars will eventually be exported transporter (SWEET) and one sucrose transporter (SUT) are expressed to a greater extent in D. officinale stems than in leaves. Two glycosyltransferase (GT) and four cellulose synthase (Ces) genes undergo a distinct degree of alternative splicing. In the stems, the content of polysaccharides is twice as much as that in the leaves. The differentially expressed GT and transcription factor (TF) genes will be the focus of further study. The genes DoSWEET4 and DoSUT1 are significantly expressed in the stem, and are likely to be involved in sugar loading in the phloem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.