Proteins play a critical role in complex biological systems, yet about half of the proteins in publicly available databases are annotated as functionally unknown. Proteome-wide functional classification using bioinformatics approaches thus is becoming an important method for revealing unknown protein functions. Using the hyperthermophilic archaeon Pyrococcus furiosus as a model species, we used the support vector machine (SVM) method to discriminate DNA/RNA-binding proteins from proteins with other functions, using amino acid composition and periodicities as feature vectors. We defined this value as the composition score (CO) and periodicity score (PD). The P. furiosus proteins were classified into three classes (I–III) on the basis of the two-dimensional correlation analysis of CO score and PD score. As a result, approximately 87% of the functionally known proteins categorized as class I proteins (CO score + PD score > 0.6) were found to be DNA/RNA-binding proteins. Applying the two-dimensional correlation analysis to the 994 hypothetical proteins in P. furiosus, a total of 151 proteins were predicted to be novel DNA/RNA-binding protein candidates. DNA/RNA-binding activities of randomly chosen hypothetical proteins were experimentally verified. Six out of seven candidate proteins in class I possessed DNA/RNA-binding activities, supporting the efficacy of our method.
We have developed a screening system for artificial small RNAs (sRNAs) that inhibit the growth of Escherichia coli. In this system, we used a plasmid library to express artificial sRNAs (approximately 200 bases long) containing 60 bases of random nucleotide sequence. The induced expression of the known rydB sRNA in the system reduced the amount of its possible target mRNA, rpoS, supporting the reliability of the method. To isolate clones of sRNAs that inhibited the growth of E. coli, we used two successive screening steps: (i) colony size selection on plates and (ii) monitoring E. coli growth in a 96-well plate format. As a result, 83 artificial sRNAs were identified that showed a range of inhibitory effects on bacterial growth. We also introduced nucleotide replacements into one of the highly inhibitory sRNA clones, H12, which partially abolished the inhibition of bacterial growth, suggesting that bacterial growth was inhibited in a sequence-specific manner.
Proteins are a major regulatory component in complex biological systems. Among them, DNA/RNA-binding proteins, the key components of the central dogma of molecular biology, and membrane proteins, which are necessary for both signal transduction and metabolite transport, are suggested to be the most important protein families that arose in the early stage of life. In this study, we computationally analyzed the whole proteome data of six model species to overview the protein diversity in the three domains of life (Bacteria, Archaea and Eukaryota), especially focusing on the above two protein families. To compare the protein distribution among the six model species, we calculated various protein profiles: hydropathy, molecular weight, amino acid composition and periodicity for each protein. We found a domain-specific distribution of the proteome based on 2D correlation analysis of hydropathy and molecular weight. Further, the merged protein distribution of Archaea and other domains revealed many membrane proteins localized in Bacteria-specific regions with a high ratio of hydropathy and many DNA/RNA-binding proteins localized in Eukaryotaspecific regions with a low ratio of hydropathy. Since about half of the proteins encoded in the genome are still functionally unknown, we further conducted Support Vector Machine (SVM)-based functional prediction using amino acid composition (CO score) and periodicity (PD score) as feature vectors to predict the overall number of DNA/RNA-binding proteins and membrane proteins in the proteome. Our estimation indicated that two functional categories occupy approximately 60% to 80% of the proteome, and further, the proportion of the two categories varied among the three domains of life, suggesting that the proteome has gone through different selective pressure during evolution. §1. IntroductionThe main flow of biological information is transferred from DNA to RNA to protein, the so-called central dogma of molecular biology, and all living organisms share this universal system. We are now in the middle of the post-genomic era, with an enormous number of DNA sequences from over 240,000 named organisms, 1) and the numbers are rapidly increasing day by day. Accompanied by the accumulation of genomic data, RNAs in over 400 complete genomes (Rfam database) 2) and protein sequences from 9,318 families (Pfam database) 3) are currently available. Protein molecules are the major regulatory component of the biological system. The model species Escherichia coli is known to possess approximately 4,400 proteins, and the nematode Canorhabditis elegans possesses approximately 20,000 proteins. These estimated numbers of the proteome are deduced from predicted genes in the genome sequence, and yet about half of the proteins registered in protein databases are still Downloaded from https://academic.oup.com/ptps/article-abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.