2006
DOI: 10.1186/1471-2105-7-277
|View full text |Cite
|
Sign up to set email alerts
|

EVEREST: automatic identification and classification of protein domains in all protein sequences

Abstract: Background: Proteins are comprised of one or several building blocks, known as domains. Such domains can be classified into families according to their evolutionary origin. Whereas sequencing technologies have advanced immensely in recent years, there are no matching computational methodologies for large-scale determination of protein domains and their boundaries. We provide and rigorously evaluate a novel set of domain families that is automatically generated from sequence data. Our domain family identificati… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
28
0

Year Published

2007
2007
2014
2014

Publication Types

Select...
4
3
2

Relationship

2
7

Authors

Journals

citations
Cited by 32 publications
(28 citation statements)
references
References 30 publications
0
28
0
Order By: Relevance
“…InterProScan). Domain and family-based resources provide an excellent coverage of the 'known space' using HMMs (12,000 in Pfam [24], 37,000 in EVEREST [25]). Iterative search using PSSM and HMM Profiles are often used for a comprehensive functional inference.…”
Section: Discussionmentioning
confidence: 99%
“…InterProScan). Domain and family-based resources provide an excellent coverage of the 'known space' using HMMs (12,000 in Pfam [24], 37,000 in EVEREST [25]). Iterative search using PSSM and HMM Profiles are often used for a comprehensive functional inference.…”
Section: Discussionmentioning
confidence: 99%
“…The correct way to solve this problem is to parse all of the sequences and split sequence A into two chains A1 and A2. Such parsing is not trivial (20)(21)(22), and here we deal with the problem in a different way.…”
Section: Discussionmentioning
confidence: 99%
“…Preliminary tests of automatic splitting with a 40% sequence identity threshold give a total of 51,765 chains versus the original number of 44,220 chains and has a reassuringly small effect on the results presented here. Correct parsing of chains into domains is difficult (20)(21)(22).…”
Section: Discussionmentioning
confidence: 99%
“…To this end we have developed a scoring scheme that enables scoring an evaluated domain family with respect to a reference domain family in the context of a reference system of domain families. A detailed description of the scoring scheme and the results of applying it to EVEREST is given in (10). Briefly, for an evaluated family e , let π( e ) be a collection of reference domains given by allowing each domain in the evaluated family to collect those reference domains that significantly intersect with it.…”
Section: Technical Detailsmentioning
confidence: 99%