Summary: The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning.Availability: Biopython is freely available, with documentation and source code at www.biopython.org under the Biopython license.Contact: All queries should be directed to the Biopython mailing lists, see www.biopython.org/wiki/_Mailing_listspeter.cock@scri.ac.uk.
Conservation of gene order in vertebrates is evident after hundreds of millions of years of divergence, but comparisons of the Arabidopsis thaliana sequence to partial gene orders of other angiosperms (flowering plants) sharing common ancestry approximately 170-235 million years ago yield conflicting results. This difference may be largely due to the propensity of angiosperms to undergo chromosomal duplication ('polyploidization') and subsequent gene loss ('diploidization'); these evolutionary mechanisms have profound consequences for comparative biology. Here we integrate a phylogenetic approach (relating chromosomal duplications to the tree of life) with a genomic approach (mitigating information lost to diploidization) to show that a genome-wide duplication post-dates the divergence of Arabidopsis from most dicots. We also show that an inferred ancestral gene order for Arabidopsis reveals more synteny with other dicots (exemplified by cotton), and that additional, more ancient duplication events affect more distant taxonomic comparisons. By using partial sequence data for many diverse taxa to better relate the evolutionary history of completely sequenced genomes to the tree of life, we foster comparative approaches to the study of genome organization, consequences of polyploidy, and the molecular basis of quantitative traits.
Integration of structural genomic data from a largely assembled rice genome sequence, with phylogenetic analysis of sequence samples for many other taxa, suggests that a polyploidization event occurred Ϸ70 million years ago, before the divergence of the major cereals from one another but after the divergence of the Poales from the Liliales and Zingiberales. Ancient polyploidization and subsequent ''diploidization'' (loss) of many duplicated gene copies has thus shaped the genomes of all Poaceae cereal, forage, and biomass crops. The Poaceae appear to have evolved as separate lineages for Ϸ50 million years, or two-thirds of the time since the duplication event. Chromosomes that are predicted to be homoeologs resulting from this ancient duplication event account for a disproportionate share of incongruent loci found by comparison of the rice sequence to a detailed sorghum sequence-tagged site-based genetic map. Differential gene loss during diploidization may have contributed many of these incongruities. Such predicted homoeologs also account for a disproportionate share of duplicated sorghum loci, further supporting the hypothesis that the polyploidization event was common to sorghum and rice. Comparative gene orders along paleo-homoeologous chromosomal segments provide a means to make phylogenetic inferences about chromosome structural rearrangements that differentiate among the grasses. Superimposition of the timing of major duplication events on taxonomic relationships leads to improved understanding of comparative gene orders, enhancing the value of data from botanical models for crop improvement and for further exploration of genomic biodiversity. Additional ancient duplication events probably remain to be discovered in other angiosperm lineages.colinearity ͉ chromosome structural rearrangement ͉ gene order ͉ genome duplication ͉ rice
It is currently thought that life-long blood cell production is driven by the action of a small number of multipotent haematopoietic stem cells. Evidence supporting this view has been largely acquired through the use of functional assays involving transplantation. However, whether these mechanisms also govern native non-transplant haematopoiesis is entirely unclear. Here we have established a novel experimental model in mice where cells can be uniquely and genetically labelled in situ to address this question. Using this approach, we have performed longitudinal analyses of clonal dynamics in adult mice that reveal unprecedented features of native haematopoiesis. In contrast to what occurs following transplantation, steady-state blood production is maintained by the successive recruitment of thousands of clones, each with a minimal contribution to mature progeny. Our results demonstrate that a large number of long-lived progenitors, rather than classically defined haematopoietic stem cells, are the main drivers of steady-state haematopoiesis during most of adulthood. Our results also have implications for understanding the cellular origin of haematopoietic disease.
Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.
We present Bioconda (https://bioconda.github.io), a distribution of bioinformatics software for the lightweight, multiplatform and language-agnostic package manager Conda. Currently, Bioconda offers a collection of over 3000 software packages, which is continuously maintained, updated, and extended by a growing global community of more than 200 contributors. Bioconda improves analysis reproducibility by allowing users to define isolated environments with defined software versions, all of which are easily installed and managed without administrative privileges.
Long noncoding RNAs (lncRNAs) have important regulatory roles and can function at the level of chromatin. To determine where lncRNAs bind to chromatin, we developed capture hybridization analysis of RNA targets (CHART), a hybridization-based technique that specifically enriches endogenous RNAs along with their targets from reversibly cross-linked chromatin extracts. CHART was used to enrich the DNA and protein targets of endogenous lncRNAs from flies and humans. This analysis was extended to genomewide mapping of roX2, a well-studied ncRNA involved in dosage compensation in Drosophila. CHART revealed that roX2 binds at specific genomic sites that coincide with the binding sites of proteins from the male-specific lethal complex that affects dosage compensation. These results reveal the genomic targets of roX2 and demonstrate how CHART can be used to study RNAs in a manner analogous to chromatin immunoprecipitation for proteins.chromatin-associated RNAs | chromatin-modifying complexes | RNase H mapping G enerating cellular diversity from genetic information requires the regulatory interplay between cis-acting elements encoded at specific loci in chromatin and trans-acting factors that bind them (1). Although the importance of trans-acting proteins (e.g., transcription factors) has long been appreciated, there is growing interest in the role of long noncoding RNAs (lncRNAs) (2) as factors that can regulate specific chromatin loci. This interest is enhanced by the recent discovery that the majority of eukaryotic genomes are transcribed (3) and that many of the resulting transcripts are developmentally regulated (4) but do not encode proteins. Although the functional scope of these RNAs remains unknown (5-7), several lncRNAs play important regulatory roles at the level of chromatin (8). Determining where these ncRNAs bind on the genome is central to determining their function.Examples of lncRNAs that influence chromatin include the roX ncRNAs in flies and Xist in mammals, both having wellestablished roles in dosage compensation (8, 9); Kcnq1ot1 and Air ncRNAs, which are expressed from genomically imprinted loci and affect chromatin silencing (10-13); Evf2, HSR1, and other ncRNAs that positively regulate transcription (14-16); lncRNAs that target the dihydrofolate reductase promoter and the rDNA promoters through triplex formation (17, 18); and the human HOTAIR and HOTTIP lncRNAs, which regulate polycomb-repressed and trithorax-activated chromatin, respectively (19,20). Dysregulation of several of these lncRNAs has been associated with disease (21, 22). Our understanding of the biochemical roles of these RNAs comes largely from their interactions with specific proteins-insights gained from classical biochemical techniques developed for studying translation and RNA-processing complexes and also more recent technological advances using RNA immunoprecipitation (23) and cross-linking and immunoprecipitation (24)(25)(26). These experiments suggest that several lncRNAs specifically interact with chromatin-m odifying machine...
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.