Summary: With the rapid development of DNA sequencing technology, increasing bacteria genome data enable the biologists to dig the evolutionary and genetic information of prokaryotic species from pan-genome sight. Therefore, the high-efficiency pipelines for pan-genome analysis are mostly needed. We have developed a new pan-genome analysis pipeline (PGAP), which can perform five analytic functions with only one command, including cluster analysis of functional genes, pan-genome profile analysis, genetic variation analysis of functional genes, species evolution analysis and function enrichment analysis of gene clusters. PGAP's performance has been evaluated on 11 Streptococcus pyogenes strains.Availability:PGAP is developed with Perl script on the Linux Platform and the package is freely available from http://pgap.sf.net.Contact: junyu@big.ac.cn; xiaojingfa@big.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Summary: Pan-genome analyses have shed light on the dynamics and evolution of bacterial genome from the point of population. The explosive growth of bacterial genome sequence also brought an extremely big challenge to pan-genome profile analysis. We developed a tool, named PanGP, to complete pan-genome profile analysis for large-scale strains efficiently. PanGP has integrated two sampling algorithms, totally random (TR) and distance guide (DG). The DG algorithm drew sample strain combinations on the basis of genome diversity of bacterial population. The performance of these two algorithms have been evaluated on four bacteria populations with strain numbers varying from 30 to 200, and the DG algorithm exhibited overwhelming advantage on accuracy and stability than the TR algorithm.Availability: PanGP was developed with a user-friendly graphic interface and it was available at http://PanGP.big.ac.cn.Contact: xiaojingfa@big.ac.cn or wujy@big.ac.cnSupplementary information: Supplementary data are available at Bioinformatics online.
Panax ginseng C.A. Meyer (P. ginseng) is an important medicinal plant and is often used in traditional Chinese medicine. With next generation sequencing (NGS) technology, we determined the complete chloroplast genome sequences for four Chinese P. ginseng strains, which are Damaya (DMY), Ermaya (EMY), Gaolishen (GLS), and Yeshanshen (YSS). The total chloroplast genome sequence length for DMY, EMY, and GLS was 156,354 bp, while that for YSS was 156,355 bp. Comparative genomic analysis of the chloroplast genome sequences indicate that gene content, GC content, and gene order in DMY are quite similar to its relative species, and nucleotide sequence diversity of inverted repeat region (IR) is lower than that of its counterparts, large single copy region (LSC) and small single copy region (SSC). A comparison among these four P. ginseng strains revealed that the chloroplast genome sequences of DMY, EMY, and GLS were identical and YSS had a 1-bp insertion at base 5472. To further study the heterogeneity in chloroplast genome during domestication, high-resolution reads were mapped to the genome sequences to investigate the differences at the minor allele level; 208 minor allele sites with minor allele frequencies (MAF) of ≥0.05 were identified. The polymorphism site numbers per kb of chloroplast genome sequence for DMY, EMY, GLS, and YSS were 0.74, 0.59, 0.97, and 1.23, respectively. All the minor allele sites located in LSC and IR regions, and the four strains showed the same variation types (substitution base or indel) at all identified polymorphism sites. Comparison results of heterogeneity in the chloroplast genome sequences showed that the minor allele sites on the chloroplast genome were undergoing purifying selection to adapt to changing environment during domestication process. A study of P. ginseng chloroplast genome with particular focus on minor allele sites would aid in investigating the dynamics on the chloroplast genomes and different P. ginseng strains typing.
Although long noncoding RNAs (lncRNAs) do not have protein coding capacities, they are involved in the pathogenesis of many types of cancers, including hepatocellular carcinoma, cervical cancer, and gastric cancer. Notably, the roles of lncRNAs are vital in nearly every aspect of tumor biology. Long non‐coding small nucleolar RNA host genes (lnc‐SNHGs) are abnormally expressed in multiple cancers, including urologic neoplasms, respiratory tumors, and digestive cancers, and play vital roles in these cancers. These host genes could participate in tumorigenesis by regulating proliferation, migration, invasion and apoptosis of tumor cells. This review focuses on the overview of the roles that lnc‐SNHGs play in the formation and progression of digestive cancers.
Mycobacterium abscessus (Ma) is an emerging human pathogen that causes both soft tissue infections and systemic disease. We present the first comparative whole-genome study of Ma strains isolated from patients of wide geographical origin. We found a high proportion of accessory strain-specific genes indicating an open, non-conservative pan-genome structure, and clear evidence of rapid phage-mediated evolution. Although we found fewer virulence factors in Ma compared to M. tuberculosis, our data indicated that Ma evolves rapidly and therefore should be monitored closely for the acquisition of more pathogenic traits. This comparative study provides a better understanding of Ma and forms the basis for future functional work on this important pathogen.
A transcription factor functions differentially and/or identically in multiple cell types. However, the mechanism for cell-specific regulation of a transcription factor remains to be elucidated. We address how a single transcription factor, forkhead box protein A1 (FOXA1), forms cell-specific genomic signatures and differentially regulates gene expression in four human cancer cell lines (HepG2, LNCaP, MCF7, and T47D). FOXA1 is a pioneer transcription factor in organogenesis and cancer progression. Genomewide mapping of FOXA1 by chromatin immunoprecipitation sequencing annotates that target genes associated with FOXA1 binding are mostly common to these cancer cells. However, most of the functional FOXA1 target genes are specific to each cancer cell type. Further investigations using CRISPR-Cas9 genome editing technology indicate that cell-specific FOXA1 regulation is attributable to unique FOXA1 binding, genetic variations, and/or potential epigenetic regulation. Thus, FOXA1 controls the specificity of cancer cell types. We raise a "flower-blooming" hypothesis for cell-specific transcriptional regulation based on these observations.
MethBank (http://bigd.big.ac.cn/methbank) is a database that integrates high-quality DNA methylomes across a variety of species and provides an interactive browser for visualization of methylation data. Here, we present an updated implementation of MethBank (version 3.0) by incorporating more DNA methylomes from multiple species and equipping with more enhanced functionalities for data annotation and more friendly web interfaces for data presentation, search and visualization. MethBank 3.0 features large-scale integration of high-quality methylomes, involving 34 consensus reference methylomes derived from a large number of human samples, 336 single-base resolution methylomes from different developmental stages and/or tissues of five plants, and 18 single-base resolution methylomes from gametes and early embryos at multiple stages of two animals. Additionally, it is enhanced by improving the functionalities for data annotation, which accordingly enables systematic identification of methylation sites closely associated with age, sites with constant methylation levels across different ages, differentially methylated promoters, age-specific differentially methylated cytosines/regions, and methylated CpG islands. Moreover, MethBank provides tools to estimate human methylation age online and to identify differentially methylated promoters, respectively. Taken together, MethBank is upgraded with significant improvements and advances over the previous version, which is of great help for deciphering DNA methylation regulatory mechanisms for epigenetic studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.