2005
DOI: 10.1110/ps.041092605
|View full text |Cite
|
Sign up to set email alerts
|

Availability of short amino acid sequences in proteins

Abstract: Much attention is being paid to protein databases as an important information source for proteome research. Although used extensively for similarity searches, protein databases themselves have not fully been characterized. In a systematic attempt to reveal protein-database characters that could contribute to revealing how protein chains are constructed, frequency distributions of all possible combinatorial sets of three, four, and five amino acids ("triplets," "quartets," and "pentats"; collectively called con… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

2
42
0

Year Published

2007
2007
2024
2024

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 32 publications
(44 citation statements)
references
References 36 publications
2
42
0
Order By: Relevance
“…1 All penta-peptides that were reported as missing in the work of Otaki and colleagues do appear in our database, which is more up-to-date (and more than four times larger than the data set Otaki and colleagues used) (Otaki et al 2005). 2 Hampikian and Andersen mainly describe a method for fast calculation of k-mers that do not appear in a set of proteins.…”
Section: Preliminaries and Definitionsmentioning
confidence: 99%
See 2 more Smart Citations
“…1 All penta-peptides that were reported as missing in the work of Otaki and colleagues do appear in our database, which is more up-to-date (and more than four times larger than the data set Otaki and colleagues used) (Otaki et al 2005). 2 Hampikian and Andersen mainly describe a method for fast calculation of k-mers that do not appear in a set of proteins.…”
Section: Preliminaries and Definitionsmentioning
confidence: 99%
“…Many bioinformatic investigations have explored the sequences of amino acids in proteins (see, for example, Gavel and Heijne 1992;Echols et al 2002;Qi et al 2004;White and Heijne 2004;Otaki et al 2005;White and Heijne 2005;Ulitsky et al 2006;Hampikian and Andersen 2007), or have attempted to model proteins by various probability models (Krogh et al 1994;Abe and Mamitsuka 1997;Durbin et al 1998;Bystroff et al 2000;Eddy 2004). Otaki and colleagues 1 (Otaki et al 2005) have examined the space of ''missing'' AA sequences and have discovered the missing penta-peptides.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…The most basic empirical question that has been investigated is that of missing DNA k-mers. Earlier works have studied non-existent short amino acid k-mers [3,4], and have attributed them mainly to chemical constraints (such as hydrophobic and hydrophilic amino acids). DNA does not have the complex three-dimensional structure and chemical constraints of proteins, although the nucleotide composition has been reported by el antri et al [5] to weakly affect the structure of double-stranded DNA.…”
Section: Introductionmentioning
confidence: 99%
“…Short-sequence use has been analyzed previously (17)(18)(19)(20), and different reasons for a lack of some sequences have been suggested. Our bioinformatics results identify triplet and quadruplet sequences that slow translation and lead to stalling almost immediately upon entry into the exit tunnel.…”
Section: Discussionmentioning
confidence: 99%