A protein is usually classified into one of the following five structural classes: alpha, beta, alpha + beta, alpha/beta, and zeta (irregular). The structural class of a protein is correlated with its amino acid composition. However, given the amino acid composition of a protein, how may one predict its structural class? Various efforts have been made in addressing this problem. This review addresses the progress in this field, with the focus on the state of the art, which is featured by a novel prediction algorithm and a recently developed database. The novel algorithm is characterized by a covariance matrix that takes into account the coupling effect among different amino acid components of a protein. The new database was established based on the requirement that the classes should have (1) as many nonhomologous structures as possible, (2) good quality structure, and (3) typical or distinguishable features for each of the structural classes concerned. The very high success rate for both the training-set proteins and the testing-set proteins, which has been further validated by a simulated analysis and a jackknife analysis, indicates that it is possible to predict the structural class of a protein according to its amino acid composition if an ideal and complete database can be established. It also suggests that the overall fold of a protein is basically determined by its amino acid composition.
BackgroundThe h-index has already been used by major citation databases to evaluate the academic performance of individual scientists. Although effective and simple, the h-index suffers from some drawbacks that limit its use in accurately and fairly comparing the scientific output of different researchers. These drawbacks include information loss and low resolution: the former refers to the fact that in addition to h 2 citations for papers in the h-core, excess citations are completely ignored, whereas the latter means that it is common for a group of researchers to have an identical h-index.Methodology/Principal FindingsTo solve these problems, I here propose the e-index, where e 2 represents the ignored excess citations, in addition to the h 2 citations for h-core papers. Citation information can be completely depicted by using the h-index together with the e-index, which are independent of each other. Some other h-type indices, such as a and R, are h-dependent, have information redundancy with h, and therefore, when used together with h, mask the real differences in excess citations of different researchers.Conclusions/SignificanceAlthough simple, the e-index is a necessary h-index complement, especially for evaluating highly cited scientists or for precisely comparing the scientific output of a group of scientists having an identical h-index.
BackgroundChromosomal replication is the central event in the bacterial cell cycle. Identification of replication origins (oriCs) is necessary for almost all newly sequenced bacterial genomes. Given the increasing pace of genome sequencing, the current available software for predicting oriCs, however, still leaves much to be desired. Therefore, the increasing availability of genome sequences calls for improved software to identify oriCs in newly sequenced and unannotated bacterial genomes.ResultsWe have developed Ori-Finder, an online system for finding oriCs in bacterial genomes based on an integrated method comprising the analysis of base composition asymmetry using the Z-curve method, distribution of DnaA boxes, and the occurrence of genes frequently close to oriCs. The program can also deal with unannotated genome sequences by integrating the gene-finding program ZCURVE 1.02. Output of the predicted results is exported to an HTML report, which offers convenient views on the results in both graphical and tabular formats.ConclusionA web-based system to predict replication origins of bacterial genomes has been presented here. Based on this system, oriC regions have been predicted for the bacterial genomes available in GenBank currently. It is hoped that Ori-Finder will become a useful tool for the identification and analysis of oriCs in both bacterial and archaeal genomes.
A new system, ZCURVE 1.0, for finding protein- coding genes in bacterial and archaeal genomes has been proposed. The current algorithm, which is based on the Z curve representation of the DNA sequences, lays stress on the global statistical features of protein-coding genes by taking the frequencies of bases at three codon positions into account. In ZCURVE 1.0, since only 33 parameters are used to characterize the coding sequences, it gives better consideration to both typical and atypical cases, whereas in Markov-model-based methods, e.g. Glimmer 2.02, thousands of parameters are trained, which may result in less adaptability. To compare the performance of the new system with that of Glimmer 2.02, both systems were run, respectively, for 18 genomes not annotated by the Glimmer system. Comparisons were also performed for predicting some function-known genes by both systems. Consequently, the average accuracy of both systems is well matched; however, ZCURVE 1.0 has more accurate gene start prediction, lower additional prediction rate and higher accuracy for the prediction of horizontally transferred genes. It is shown that the joint applications of both systems greatly improve gene-finding results. For a typical genome, e.g. Escherichia coli, the system ZCURVE 1.0 takes approximately 2 min on a Pentium III 866 PC without any human intervention. The system ZCURVE 1.0 is freely available at: http://tubic. tju.edu.cn/Zcurve_B/.
INTRODUCTION The Saccharomyces cerevisiae 2.0 project (Sc2.0) aims to modify the yeast genome with a series of densely spaced designer changes. Both a synthetic yeast chromosome arm (synIXR) and the entirely synthetic chromosome (synIII) function with high fitness in yeast. For designer genome synthesis projects, precise engineering of the physical sequence to match the specified design is important for the systematic evaluation of underlying design principles. Yeast can maintain nuclear chromosomes as rings, occurring by chance at repeated sequences, although the cyclized format is unfavorable in meiosis given the possibility of dicentric chromosome formation from meiotic recombination. Here, we describe the de novo synthesis of synthetic yeast chromosome V (synV) in the “Build-A-Genome China” course, perfectly matching the designer sequence and bearing loxPsym sites, distinguishable watermarks, and all the other features of the synthetic genome. We generated a ring synV derivative with user-specified cyclization coordinates and characterized its performance in mitosis and meiosis. RATIONALE Systematic evaluation of underlying Sc2.0 design principles requires that the final assembled synthetic genome perfectly match the designed sequence. Given the size of yeast chromosomes, synthetic chromosome construction is performed iteratively, and new mutations and unpredictable events may occur during synthesis; even a very small number of unintentional nucleotide changes across the genome could have substantial effects on phenotype. Therefore, precisely matching the physical sequence to the designed sequence is crucial for verification of the design principles in genome synthesis. Ring chromosomes can extend those design principles to provide a model for genomic rearrangement, ring chromosome evolution, and human ring chromosome disorders. RESULTS We chemically synthesized, assembled, and incorporated designer chromosome synV (536,024 base pairs) of S. cerevisiae according to Sc2.0 principles, based on the complete nucleotide sequence of native yeast chromosome V (576,874 base pairs). This work was performed as part of the “Build-A-Genome China” course in Tianjin University. We corrected all mutations found—including duplications, substitutions, and indels—in the initial synV strain by using integrative cotransformation of the precise desired changes and by means of a clustered regularly interspaced short palindromic repeats (CRISPR)/CRISPR-associated protein 9 (Cas9)–based method. Altogether, 3331 corrected base pairs were required to match to the designed sequence. We generated a strain that exactly matches all designer sequence changes that displays high fitness under a variety of culture conditions. All corrections were verified with whole-genome sequencing; RNA sequencing revealed only minor changes in gene expression—most notably, decreases in expression of genes relocated near synthetic telomeres as a result of design. We constructed a functional circular synV (ring_synV) derivative in yeast by precisely joining both chromosome ends (telomeres) at specified coordinates. The ring chromosome showed restoration of subtelomeric gene expression levels. The ring_synV strain exhibited fitness comparable with that of the linear synV strain, revealed no change in sporulation frequency, but notably reduced spore viability. In meiosis, heterozygous or homozygous diploid ring_wtV and ring_synV chromosomes behaved similarly, exhibiting substantially higher frequency of the formation of zero-spore tetrads, a type that was not seen in the rod chromosome diploids. Rod synV chromosomes went through meiosis with high spore viability, despite no effort having been made to preserve meiotic competency in the design of synV. CONCLUSION The perfect designer-matched synthetic chromosome V provides strategies to edit sequence variants and correct unpredictable events, such as off-target integration of extra copies of synthetic DNA elsewhere in the genome. We also constructed a ring synthetic chromosome derivative and evaluated its fitness and stability in yeast. Both synV and synVI can be circularized and can power yeast cell growth without affecting fitness when gene content is maintained. These fitness and stability phenotypes of the ring synthetic chromosome in yeast provide a model system with which to probe the mechanism of human ring chromosome disorders. Synthesis, cyclization, and characterization of synV . ( A ) Synthetic chromosome V (synV, 536,024 base pairs) was designed in silico from native chromosome V (wtV, 576,874 base pairs), with extensive genotype modification designed to be phenotypically neutral. ( B ) CRISPR/Cas9 strategy for multiplex repair. ( C ) Colonies of wtV, synV, and ring_synV strains.
A vector projection method is proposed to predict the cleavability of oligopeptides by extended-specificity site proteases. For an enzyme with eight specificity subsites the substrate octapeptide can be uniquely expressed as a vector in an 8-dimensional space, whose eight bases correspond to the amino acids at the eight subsites, P4, P3, P2, P1, P1', P2', P3', and P4', respectively. The component of such a characteristic vector on each of the eight bases is defined as the frequency of an amino acid occurring at a given site. These frequencies were derived from a set of octapeptides known to be cleaved by HIV protease. The cleavability of an octapeptide can then be estimated from the projection of its characteristic vector on an idealized, optimally cleavable vector. The high ratio of correct prediction vs. total prediction for the data in both the training and the testing sets indicates that the new method is self-consistent and efficient. It provides a rapid and accurate algorithm for analyzing the specificity of any multi-subsite enzyme for which there is no coupling between subsites. In particular, it is useful for predicting the cleavability of an oligopeptide by either HIV-1 or HIV-2 protease, and hence offers a supplementary means for finding effective inhibitors of HIV protease as potential drugs against AIDS.
A novel method mapping the DNA or RNA sequence into a folding curve in three dimensional space, the Z curve, has been proposed based on the symmetry of the regular tetrahedrons. There exists a unique Z curve for a given DNA sequence, on the contrary, the DNA sequence can be uniquely determined by the given Z curve. The properties of the Z curves have been studied in great details. The symmetry, the periodicity, the local motif, and the global feather of the distribution of bases of the DNA sequences are reflected by the rich folding structures of the Z curves. The Z curves may be smoothed by the B-spline functions of different orders. Therefore, the Z curves may have any resolution by choosing the suitable spline functions. The higher the order of the B-spline function chosen, the lower the resolution of the Z curve. So, the Z curves are suitable for visualizing and analyzing the DNA sequences with any length. The study of the Z curves develops further a new area to visualizing and analyzing the DNA sequences by a geometrical approach. The method of the Z curves may be strengthened by using the ripe mathematical tools of geometry on the one hand; and by using the powerful technique of the computer graphics on the other hand.
Essential genes refer to genes that are required by an organism to survive under specific conditions. Studies of the minimal-gene-set for bacteria have elucidated fundamental cellular processes that sustain life. The past five years have seen a significant progress in identifying human essential genes, primarily due to the successful use of CRISPR/Cas9 in various types of human cells. DEG 15, a new release of the Database of Essential Genes (www.essentialgene.org), has provided major advancements, compared to DEG 10. Specifically, the number of eukaryotic essential genes has increased by more than fourfold, and that of prokaryotic ones has more than doubled. Of note, the human essential-gene number has increased by more than tenfold. Moreover, we have developed built-in analysis modules by which users can perform various analyses, such as essential-gene distributions between bacterial leading and lagging strands, sub-cellular localization distribution, enrichment analysis of gene ontology and KEGG pathways, and generation of Venn diagrams to compare and contrast gene sets between experiments. Additionally, the database offers customizable BLAST tools for performing species- and experiment-specific BLAST searches. Therefore, DEG comprehensively harbors updated human-curated essential-gene records among prokaryotes and eukaryotes with built-in tools to enhance essential-gene analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.