As next-generation sequencing projects generate massive genome-wide sequence variation data, bioinformatics tools are being developed to provide computational predictions on the functional effects of sequence variations and narrow down the search of casual variants for disease phenotypes. Different classes of sequence variations at the nucleotide level are involved in human diseases, including substitutions, insertions, deletions, frameshifts, and non-sense mutations. Frameshifts and non-sense mutations are likely to cause a negative effect on protein function. Existing prediction tools primarily focus on studying the deleterious effects of single amino acid substitutions through examining amino acid conservation at the position of interest among related sequences, an approach that is not directly applicable to insertions or deletions. Here, we introduce a versatile alignment-based score as a new metric to predict the damaging effects of variations not limited to single amino acid substitutions but also in-frame insertions, deletions, and multiple amino acid substitutions. This alignment-based score measures the change in sequence similarity of a query sequence to a protein sequence homolog before and after the introduction of an amino acid variation to the query sequence. Our results showed that the scoring scheme performs well in separating disease-associated variants (n = 21,662) from common polymorphisms (n = 37,022) for UniProt human protein variations, and also in separating deleterious variants (n = 15,179) from neutral variants (n = 17,891) for UniProt non-human protein variations. In our approach, the area under the receiver operating characteristic curve (AUC) for the human and non-human protein variation datasets is ∼0.85. We also observed that the alignment-based score correlates with the deleteriousness of a sequence variation. In summary, we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), which provides a generalized approach to predict the functional effects of protein sequence variations including single or multiple amino acid substitutions, and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.
Supplementary data are available at Bioinformatics online.
Carbon dioxide uptake and water vapour release in plants occur through stomata, which are formed by guard cells. These cells respond to light intensity, CO2 and water availability, and plant hormones. The predicted increase in the atmospheric concentration of CO2 is expected to have a profound effect on our ecosystem. However, many aspects of CO2-dependent stomatal movements are still not understood. Here we show that the ABC transporter AtABCB14 modulates stomatal closure on transition to elevated CO2. Stomatal closure induced by high CO2 levels was accelerated in plants lacking AtABCB14. Apoplastic malate has been suggested to be one of the factors mediating the stomatal response to CO2 (Refs 4,5) and indeed, exogenously applied malate induced a similar AtABCB14-dependent response as high CO2 levels. In isolated epidermal strips that contained only guard cells, malate-dependent stomatal closure was faster in plants lacking the AtABCB14 and slower in AtABCB14-overexpressing plants, than in wild-type plants, indicating that AtABCB14 catalyses the transport of malate from the apoplast into guard cells. Indeed, when AtABCB14 was heterologously expressed in Escherichia coli and HeLa cells, increases in malate transport activity were observed. We therefore suggest that AtABCB14 modulates stomatal movement by transporting malate from the apoplast into guard cells, thereby increasing their osmotic pressure. University of Science and Technology, Pohang, Korea; [10][11][12] and had a strongly reduced sensitivity to glibenclamide, ABA, calcium and auxin, which are well known to control stomatal movement. We therefore were interested whether AtABCB14 also exhibits a regulatory function in guard cell physiology.AtABCB14 expression, as visualized by the activity of an AtABCB14 promoter::GUS fusion construct, is not restricted to guard cells of leaves only, but is also found in guard cells of stems, flowers and siliques ( Fig. 1a-f). In leaves, GUS activity was also detected in epidermal and at very low levels in mesophyll cells (Fig. 1c). These promoter::GUS expressions corresponded to the transcript levels detected in mesophyll and guard cell protoplasts (Fig. 1g). Transient expression of an 35S::AtABCB14:GFP construct in Arabidopsis protoplasts revealed that AtABCB14 is targeted to the plasma membrane (Fig. 1h, i). AtABCB14:sGFP expressed under the control of the AtABCB14 native promoter was targeted to the plasma membrane of guard cells (Fig. 1n). Coexpression of AtABCB14 with AtAHA2:RFP, a fusion protein of a plasma membrane localized proton pump with a red fluorescent protein 13 , resulted in a perfect co-localization ( Fig. 1j-l). Fractionation of microsomes on a sucrose density gradient further confirmed that AtABCB14:HA protein was targeted to the plasma membrane: the distribution pattern of the protein crossreacting with the HA antibody corresponded to that of AtPDR8, a plasma membrane protein 14 and differed from the patterns of the ER (Bip) and vacuolar markers (γ-TIP) (Fig. 1m). These results indica...
Information theory traditionally deals with "conventional data," be it textual data, image, or video data. However, databases of various sorts have come into existence in recent years for storing "unconventional data" including biological data, social data, web data, topographical maps, and medical data. In compressing such data, one must consider two types of information: the information conveyed by the structure itself, and the information conveyed by the data labels implanted in the structure. In this paper, we attempt to address the former problem by studying information of graphical structures (i.e., unlabeled graphs). As the first step, we consider the Erdős-Rényi graphs G(n, p) over n vertices in which edges are added randomly with probability p. We prove that the structural entropy of G(n, p) iswhere h(p) = −p log p − (1 − p) log(1 − p) is the entropy rate of a conventional memoryless binary source. Then, we propose a two-stage compression algorithm that asymptotically achieves the structural entropy up to the first two leading terms. Our algorithm runs in O(n+e) time on average where e is the number of edges. To the best of our knowledge, this is the first provable (asymptotically) optimal graph compressor. We use combinatorial and analytic techniques such as generating functions, Mellin transform, and poissonization to establish these findings. Our experiments confirm theoretical results and show the usefulness of our algorithm for real-world graphs such as the Internet, biological networks, and social networks.
Humans are a diploid species that inherit one set of chromosomes paternally and one homologous set of chromosomes maternally. Unfortunately, most human sequencing initiatives ignore this fact in that they do not directly delineate the nucleotide content of the maternal and paternal copies of the 23 chromosomes individuals possess (i.e., they do not ‘phase’ the genome) often because of the costs and complexities of doing so. We compared 11 different widely-used approaches to phasing human genomes using the publicly available ‘Genome-In-A-Bottle’ (GIAB) phased version of the NA12878 genome as a gold standard. The phasing strategies we compared included laboratory-based assays that prepare DNA in unique ways to facilitate phasing as well as purely computational approaches that seek to reconstruct phase information from general sequencing reads and constructs or population-level haplotype frequency information obtained through a reference panel of haplotypes. To assess the performance of the 11 approaches, we used metrics that included, among others, switch error rates, haplotype block lengths, the proportion of fully phase-resolved genes, phasing accuracy and yield between pairs of SNVs. Our comparisons suggest that a hybrid or combined approach that leverages: 1. population-based phasing using the SHAPEIT software suite, 2. either genome-wide sequencing read data or parental genotypes, and 3. a large reference panel of variant and haplotype frequencies, provides a fast and efficient way to produce highly accurate phase-resolved individual human genomes. We found that for population-based approaches, phasing performance is enhanced with the addition of genome-wide read data; e.g., whole genome shotgun and/or RNA sequencing reads. Further, we found that the inclusion of parental genotype data within a population-based phasing strategy can provide as much as a ten-fold reduction in phasing errors. We also considered a majority voting scheme for the construction of a consensus haplotype combining multiple predictions for enhanced performance and site coverage. Finally, we also identified DNA sequence signatures associated with the genomic regions harboring phasing switch errors, which included regions of low polymorphism or SNV density.
BackgroundInfections by pan-drug resistant Acinetobacter baumannii plague military and civilian healthcare systems. Previous A. baumannii pan-genomic studies used modest sample sizes of low diversity and comparisons to a single reference genome, limiting our understanding of gene order and content. A consensus representation of multiple genomes will provide a better framework for comparison. A large-scale comparative study will identify genomic determinants associated with their diversity and adaptation as a successful pathogen.ResultsWe determine draft-level genomic sequence of 50 diverse military isolates and conduct the largest bacterial pan-genome analysis of 249 genomes. The pan-genome of A. baumannii is open when the input genomes are normalized for diversity with 1867 core proteins and a paralog-collapsed pan-genome size of 11,694 proteins. We developed a novel graph-based algorithm and use it to assemble the first consensus pan-chromosome, identifying both the order and orientation of core genes and flexible genomic regions. Comparative genome analyses demonstrate the existence of novel resistance islands and isolates with increased numbers of resistance island insertions over time, from single insertions in the 1950s to triple insertions in 2011. Gene clusters responsible for carbon utilization, siderophore production, and pilus assembly demonstrate frequent gain or loss among isolates.ConclusionsThe highly variable and dynamic nature of the A. baumannii genome may be the result of its success in rapidly adapting to both abiotic and biotic environments through the gain and loss of gene clusters controlling fitness. Importantly, some archaic adaptation mechanisms appear to have reemerged among recent isolates.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-015-0701-6) contains supplementary material, which is available to authorized users.
Recently we have developed a new algorithm, PROVEAN (Protein Variation Effect Analyzer), for predicting the functional effect of protein sequence variations, including single amino acid substitutions and small insertions and deletions [2]. The prediction is based on the change, caused by a given variation, in the similarity of the query sequence to a set of its related protein sequences. For this prediction, the algorithm is required to compute a semi-global pairwise sequence alignment score between the query sequence and each of the related sequences. Using dynamic programming, it takes O(n · m) time to compute alignment score between the query sequence Q of length n and a related sequence S of length m. Thus given ℓ different variations in Q, in a naive way it would take O(ℓ · n · m) time to compute the alignment scores between each of the variant query sequences and S. In this paper, we present a new approach to efficiently compute the pairwise alignment scores for ℓ variations, which takes O((n + ℓ) · m) time when the length of variations is bounded by a constant. In this approach, we further utilize the solutions of overlapping subproblems, which are already used by dynamic programming approach. Our algorithm has been used to build a new database for precomputed prediction scores for all possible single amino acid substitutions, single amino acid insertions, and up to 10 amino acids deletions in about 91K human proteins (including isoforms), where ℓ becomes very large, that is, ℓ = O(n). The PROVEAN source code and web server are available at
Previous report showed that cytosolic Ca2+ induced by hepatitis B virus X protein (HBx) promotes HBV replication. In this study, in vitro experiments showed that (i) HBV core assembly in vitro was promoted by Ca2+ through the sucrose density gradient and the analytical ultracentrifuge analysis. Also, (ii) transmission electron microscope analysis demonstrated these assembled HBV core particles were the capsids. Ex vivo experiments showed that the treatment of BAPTA-AM and cyclosporine A (CsA) reduced HBV capsids in the transfected HepG2 cells. In addition to that, the treatment of Thapsigargin (TG) increased HBV capsids in the transfected HepG2 cells. Furthermore, we investigated the increased HBV core assembly by HBx. The results show that the increased cytosolic calcium ions by HBx promote the HBV core assembly.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.