The hepatitis C virus (HCV) genome shows remarkable sequence variability, leading to the classification of at least six major genotypes, numerous subtypes and a myriad of quasispecies within a given host. A database allowing researchers to investigate the genetic and structural variability of all available HCV sequences is an essential tool for studies on the molecular virology and pathogenesis of hepatitis C as well as drug design and vaccine development. We describe here the European Hepatitis C Virus Database (euHCVdb, ), a collection of computer-annotated sequences based on reference genomes. The annotations include genome mapping of sequences, use of recommended nomenclature, subtyping as well as three-dimensional (3D) molecular models of proteins. A WWW interface has been developed to facilitate database searches and the export of data for sequence and structure analyses. As part of an international collaborative effort with the US and Japanese databases, the European HCV Database (euHCVdb) is mainly dedicated to HCV protein sequences, 3D structures and functional analyses.
Toxoplasma gondii is an obligate intracellular parasite that contains a relic plastid, called the apicoplast, deriving from a secondary endosymbiosis with an ancestral alga. Metabolic labelling experiments using [14C]acetate led to a substantial production of numerous glycero- and sphingo-lipid classes in extracellular tachyzoites. Syntheses of all these lipids were affected by the herbicide haloxyfop, demonstrating that their de novo syntheses necessarily required a functional apicoplast fatty acid synthase II. The complex metabolic profiles obtained and a census of glycerolipid metabolism gene candidates indicate that synthesis is probably scattered in the apicoplast membranes [possibly for PA (phosphatidic acid), DGDG (digalactosyldiacylglycerol) and PG (phosphatidylglycerol)], the endoplasmic reticulum (for major phospholipid classes and ceramides) and mitochondria (for PA, PG and cardiolipid). Based on a bioinformatic analysis, it is proposed that apicoplast produced acyl-ACP (where ACP is acyl-carrier protein) is transferred to glycerol-3-phosphate for apicoplast glycerolipid synthesis. Acyl-ACP is also probably transported outside the apicoplast stroma and irreversibly converted into acyl-CoA. In the endoplasmic reticulum, acyl-CoA may not be transferred to a three-carbon backbone by an enzyme similar to the cytosolic plant glycerol-3-phosphate acyltransferase, but rather by a dual glycerol-3-phosphate/dihydroxyacetone-3-phosphate acyltransferase like in animal and yeast cells. We further showed that intracellular parasites could also synthesize most of their lipids from scavenged host cell precursors. The observed appearance of glycerolipids specific to either the de novo pathway in extracellular parasites (unknown glycerolipid 1 and the plant like DGDG), or the intracellular stages (unknown glycerolipid 8), may explain the necessary coexistence of both de novo parasitic acyl-lipid synthesis and recycling of host cell compounds.
Part of the effort to develop hepatitis C-specific drugs and vaccines is the study of genetic variability of all publicly available HCV sequences. Three HCV databases are currently available to aid this effort and to provide additional insight into the basic biology, immunology, and evolution of the virus. The Japanese HCV database (http:// s2as02.genes.nig.ac.jp) gives access to a genomic mapping of sequences as well as their phylogenetic relationships. The European HCV database (http://euhcvdb.ibcp.fr) offers access to a computer-annotated set of sequences and molecular models of HCV proteins and focuses on protein sequence, structure and function analysis. T he hepatitis C virus (HCV) has infected approximately 170 million people worldwide. HCV infection is cleared in about 25% of cases, 1,2 and in the rest results in chronic infection. Chronic HCV infection can lead to cirrhosis and liver cancer, and is the leading cause of liver transplantation in the United States. A recent Canadian study 3 estimated that lifetime HCV-associated mortality is around 1 in 8; a much larger number (an estimated 1 in 4) will develop cirrhosis of the liver. Most likely this number will be higher in less developed countries. With 170 million people infected worldwide, this means 20 million HCVrelated deaths in the next few decades.HCV is a positive-sense RNA virus with a genome of Ϸ10 kb, which encodes a single polyprotein of Ϸ3000 amino acids (aa) that is cleaved into three structural proteins (core, Envelope E1 and E2), the p7 protein whose function has not been determined, and six non-structural proteins (NS2, NS3, NS4A, NS4B, NS5A and NS5B). It has been classified as a hepacivirus, in of the Flaviviridae family, which also includes flaviviruses (West Nile, Japanese encephalitis and yellow fever viruses) and pestiviruses (bovin viral diarrhea and hog cholera virus). HCV shares some structural features with these viruses. However, the genetic distance between HCV and other flaviviruses is large enough that HCV cannot be meaningfully aligned to its flavivirus "relatives" over its entire genome 4 (also see http://hcv.lanl.gov/content/hcv-db/GET_ALIGNMENTS/ flavi-align.html), and it also shares structural features with the pestivirus family.HCV is subdivided into six genotypes and about 80 subtypes on the basis of nucleotide sequence identity. 5 In addition to genotypes, HCV exists within its hosts as a pool of genetically distinct but closely related variants referred to as quasispecies. 6 While there is limited knowledge about the immunogenicity of HCV, it is widely expected that both the generation of escape and resistance mutations and the high variability itself will create formidable problems for drug and vaccine design. 7 This paper discusses three HCV databases available worldwide, in order of seniority: the Japanese HCV map and phylogeny database, the European HCV sequence and molecular models database and the Los Alamos HCV sequence and immunology databases. We will first describe the three databases, highlighting comm...
The organization and mining of malaria genomic and post-genomic data is important to significantly increase the knowledge of the biology of its causative agents, and is motivated, on a longer term, by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should, therefore, be as reliable and versatile as possible. In this context, five aspects of the organization and mining of malaria genomic and post-genomic data were examined: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes, particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from Xomic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Recent progress towards a grid-enabled chemogenomic knowledge space is discussed.
During the course of evolution, variations of a protein sequence is an ongoing phenomenon however limited by the need to maintain its structural and functional integrity. Deciphering the evolutionary path of a protein is thus of fundamental interest. With the development of new methods to visualize high dimension spaces and the improvement of phylogenetic analysis tools, it is possible to study the evolutionary trajectories of proteins in the sequence space. Using the Data-Driven High-Dimensional Scaling method, we show that it is possible to predict and represent potential evolutionary trajectories by representing phylogenetic trees into a 3D projection of the sequence space. With the case of the aminodeoxychorismate synthase, an enzyme involved in folate synthesis, we show that this representation raises interesting questions about the complexity of the evolution of a given biological function, in particular concerning its capacity to explore the sequence space.
Whatever the phylogenetic method, genetic sequences are often described as strings of characters, thus molecular sequences can be viewed as elements of a multi-dimensional space. As a consequence, studying motion in this space (ie, the evolutionary process) must deal with the amazing features of high-dimensional spaces like concentration of measured phenomenon.To study how these features might influence phylogeny reconstructions, we examined a particular popular method: the Fitch-Margoliash algorithm, which belongs to the Least Squares methods. We show that the Least Squares methods are closely related to Multi Dimensional Scaling. Indeed, criteria for Fitch-Margoliash and Sammon’s mapping are somewhat similar. However, the prolific research in Multi Dimensional Scaling has definitely allowed outclassing Sammon’s mapping.Least Square methods for tree reconstruction can now take advantage of these improvements. However, “false neighborhood” and “tears” are the two main risks in dimensionality reduction field: “false neighborhood” corresponds to a widely separated data in the original space that are found close in representation space, and neighbor data that are displayed in remote positions constitute a “tear”. To address this problem, we took advantage of the concepts of “continuity” and “trustworthiness” in the tree reconstruction field, which limit the risk of “false neighborhood” and “tears”. We also point out the concentration of measured phenomenon as a source of error and introduce here new criteria to build phylogenies with improved preservation of distances and robustness.The authors and the Evolutionary Bioinformatics Journal dedicate this article to the memory of Professor W.M. Fitch (1929–2011).
A configuration space of homologous protein sequences (or CSHP) has been recently constructed based on pairwise comparisons, with probabilities deduced from Z-value statistics (Monte Carlo methods applied to pairwise comparisons) and following evolutionary assumptions. A Z-value cut-off is applied so as proteins are placed in the CSHP only when the similarity of pairs of sequences is significant following the Theorem of the Upper Limit of a score Probability (TULIP theorem). Based on the positions of similar protein sequences in the CSHP, a classification can be deduced, which can be visualized as trees, called TULIP trees. In previous case studies, TULIP trees where shown to be consistent with phylogenetic trees. To date, no tool has been made available to allow the computation of TULIP trees following this model. The availability of methods to cluster proteins based on pairwise comparisons and following evolutionary assumptions should be useful for evaluation and for the future improvements they might inspire. We developed a web server allowing the local or online computation of TULIP trees based on the CSHP probabilities. The input is a set of homologous protein sequences in multi-FASTA format. Pairwise comparisons are conducted using the Smith-Waterman method, with 100-1,000 sequence shuffling to estimate pairwise Z-values. Obtained Z-value matrix is used to infer a tree which is then written to a file. Output consists therefore of a Z-value matrix, a distance matrix, a TULIP treefile in NEWICK format, and a TULIP tree visualisation. The TULIP server provides an easy-to-use interface to the TULIP software, and allows a classification of protein sequences based on pairwise alignments and following evolutionary assumptions. TULIP trees are consistent with phylogenies in numerous cases, but they can be inconsistent for multi-domain proteins in which some domains have been conserved in all branches. Thus TULIP trees cannot be considered as conventional phylogenetic trees, following the MIAPA (Minimum Information About a Phylogenetic Analysis) recommendations. A major strength of the TULIP classification is its statistical validity when analysing samples including compositionally unbiased and biased sequences (i.e. with biased amino acid distributions), like sequences from Plasmodium falciparum. The TULIP web server is a service of the Malaria Portal of the University of Pretoria, South Africa, and is available at
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.