Multiple sequence alignment (MSA) is ubiquitous in evolution and bioinformatics. MSAs are usually taken to be a known and fixed quantity on which to perform downstream analysis despite extensive evidence that MSA accuracy and uncertainty affect results. These errors are known to cause a wide range of problems for downstream evolutionary inference, ranging from false inference of positive selection to long branch attraction artifacts. The most popular approach to dealing with this problem is to remove (filter) specific columns in the MSA that are thought to be prone to error. Although popular, this approach has had mixed success and several studies have even suggested that filtering might be detrimental to phylogenetic studies. We present a graph-based clustering method to address MSA uncertainty and error in the software Divvier (available at https://github.com/simonwhelan/Divvier), which uses a probabilistic model to identify clusters of characters that have strong statistical evidence of shared homology. These clusters can then be used to either filter characters from the MSA (partial filtering) or represent each of the clusters in a new column (divvying). We validate Divvier through its performance on real and simulated benchmarks, finding Divvier substantially outperforms existing filtering software by retaining more true pairwise homologies calls and removing more false positive pairwise homologies. We also find that Divvier, in contrast to other filtering tools, can alleviate long branch attraction artifacts induced by MSA and reduces the variation in tree estimates caused by MSA uncertainty.
BackgroundClustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.ResultsHere, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.ConclusionsThe results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.
S18 family of mitochondrial ribosomal proteins (MRPS18, S18) consists of three members, S18-1 to −3. Earlier, we found that overexpression of S18-2 protein resulted in immortalization and eventual transformation of primary rat fibroblasts. The S18-1 and −3 have not exhibited such abilities. To understand the differences in protein properties, the evolutionary history of S18 family was analyzed. The S18-3, followed by S18-1 and S18-2 emerged as a result of ancient gene duplication in the root of eukaryotic species tree, followed by two metazoan-specific gene duplications. However, the most conserved metazoan S18 homolog is the S18-1; it shares the most sequence similarity with S18 proteins of bacteria and of other eukaryotic clades. Evolutionarily conserved residues of S18 proteins were analyzed in various cancers. S18-2 is mutated at a higher rate, compared with S18-1 and −3 proteins. Moreover, the evolutionarily conserved residue, Gly132 of S18-2, shows genetic polymorphism in colon adenocarcinomas that was confirmed by direct DNA sequencing.Concluding, S18 family represents the yet unexplored important mitochondrial ribosomal proteins.
BackgroundMCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.ResultsWe have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.ConclusionsVMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket.org/rhali/visualmcmc/.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-017-1505-3) contains supplementary material, which is available to authorized users.
Protein kinase B (AKT) phosphorylates numerous substrates on the consensus motif RXRXXpS/T, a docking site for 14-3-3 interactions. To identify novel AKT-induced phosphorylation events following B cell receptor (BCR) activation, we performed proteomics, biochemical and bioinformatics analyses. Phosphorylated consensus motif-specific antibody enrichment, followed by tandem mass spectrometry, identified 446 proteins, containing 186 novel phosphorylation events. Moreover, we found 85 proteins with up regulated phosphorylation, while in 277 it was down regulated following stimulation. Up regulation was mainly in proteins involved in ribosomal and translational regulation, DNA binding and transcription regulation. Conversely, down regulation was preferentially in RNA binding, mRNA splicing and mRNP export proteins. Immunoblotting of two identified RNA regulatory proteins, RBM25 and MEF-2D, confirmed the proteomics data. Consistent with these findings, the AKT-inhibitor (MK-2206) dramatically reduced, while the mTORC-inhibitor PP242 totally blocked phosphorylation on the RXRXXpS/T motif. This demonstrates that this motif, previously suggested as an AKT target sequence, also is a substrate for mTORC1/2. Proteins with PDZ, PH and/or SH3 domains contained the consensus motif, whereas in those with an HMG-box, H15 domains and/or NF-X1-zinc-fingers, the motif was absent. Proteins carrying the consensus motif were found in all eukaryotic clades indicating that they regulate a phylogenetically conserved set of proteins.
Presenilin proteins are type II transmembrane proteins. They make the catalytic component of Gamma secretase, a multiportion transmembrane protease. Amyloid protein, Notch and beta catenin are among more than 90 substrates of Presenilins. Mutations in Presenilins lead to defects in proteolytic cleavage of its substrate resulting in some of the most devastating pathological conditions including Alzheimer disease (AD), developmental disorders and cancer.In addition to catalytic roles, Presenilin protein is also shown to be involved in many noncatalytic roles i.e. calcium homeostasis, regulation of autophagy and protein trafficking etc.These proteolytic proteins are highly conserved, present in almost all the major eukaryotic groups. Studies on wide variety of organisms ranging from human to unicellular dictyostelium have shown the important catalytic and non-catalytic roles of Presenilins. In the current research project, we aimed to elucidate the phylogenetic history of Presenilins. We showed that Presenilins are the most ancient of the Gamma secretase proteins and might have their origin in last common eukaryotic ancestor (LCEA). We also demonstrated that these proteins have been evolving under strong purifying selection. Through evolutionary trace analysis, we showed that Presenilin protein sites which undergoes mutations in Familial Alzheimer Disease are highly conserved in metazoans. Finally, we discussed the evolutionary, physiological and pathological implication of our findings and proposed that evolutionary profile of Presenilins supports the loss of function hypothesis of AD pathogenesis.
BackgroundHomology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity.ResultsIn this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs.ConclusionsThe study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.Electronic supplementary materialThe online version of this article (doi:10.1186/s12862-016-0684-2) contains supplementary material, which is available to authorized users.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
334 Leonard St
Brooklyn, NY 11211
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.