2013
DOI: 10.5936/csbj.201302010
|View full text |Cite
|
Sign up to set email alerts
|

A Frequency-Based Linguistic Approach to Protein Decoding and Design: Simple Concepts, Diverse Applications, and the SCS Package

Abstract: Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-re… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

2
14
0

Year Published

2014
2014
2022
2022

Publication Types

Select...
3
3
1
1

Relationship

3
5

Authors

Journals

citations
Cited by 14 publications
(20 citation statements)
references
References 45 publications
(71 reference statements)
2
14
0
Order By: Relevance
“…There are some noteworthy recent studies that encourage this line of approach: for example, nonrandom distributions of 5-aa SCS are demonstrated in the current proteome databases [38], confirming the previous finding that biological bias occurs in protein coding [28,29]. Among these existing studies, our approach is operationally one of the simplest, and it emphasizes analogies between languages and protein sequences [32,33]. Encouragingly, linguistic aspects of proteins have been noted in other studies [48,49].…”
Section: Introductionsupporting
confidence: 82%
See 1 more Smart Citation
“…There are some noteworthy recent studies that encourage this line of approach: for example, nonrandom distributions of 5-aa SCS are demonstrated in the current proteome databases [38], confirming the previous finding that biological bias occurs in protein coding [28,29]. Among these existing studies, our approach is operationally one of the simplest, and it emphasizes analogies between languages and protein sequences [32,33]. Encouragingly, linguistic aspects of proteins have been noted in other studies [48,49].…”
Section: Introductionsupporting
confidence: 82%
“…The advantage of the alignment-free approach is that any collections of proteins can be compared quantitatively. Although various types of alignment-free approaches have been developed [24,25], including our previous attempts to use membrane topology [26] and a self-organizing map [27], the alignment-free approach in the present study is based on the "availability" (frequency bias) of short constituent sequences (SCSs) of amino acids (aa) in proteins [28][29][30][31][32][33]. The length of SCSs can be 2 aa (doublet), 3 aa (triplet), 4 aa (quartet), 5 aa (pentat), and more in a given protein.…”
Section: Introductionmentioning
confidence: 99%
“…The source data for the package have been obtained from the non-redundant amino acid database [5] of the NCBI (National Center for Biotechnology Information). The SCS package provides five applications which utilize the availability scores of SCSs and idioms [3]. This paper defines a network structure of SCS idioms which is called idiom network of SCSs.…”
Section: Idioms Of Scssmentioning
confidence: 99%
“…Bioinformatics is a powerful methodology to utilize such data and to prompt biological researches drastically [1]. In our previous works, we developed a web application called the SCS Package based on Short Constituent Sequence (SCS) of amino acid sequences [2], [3], [4]. The SCS package provides various analysis services using the availability score of SCSs.…”
Section: Introductionmentioning
confidence: 99%
“…They call the short consequent sequences (SCS) present in protein sequences as words and use availability scores to assess the biological usage bias of SCS. Our approach of using MDL for segmentation is interesting in that it does not require prior fixing of word length as in (Motomura et al, 2012), (Motomura et al, 2013).…”
Section: Related Workmentioning
confidence: 99%