2021
DOI: 10.1186/s12859-021-04013-x
|View full text |Cite
|
Sign up to set email alerts
|

Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation

Abstract: Background The identification of protein families is of outstanding practical importance for in silico protein annotation and is at the basis of several bioinformatic resources. Pfam is possibly the most well known protein family database, built in many years of work by domain experts with extensive use of manual curation. This approach is generally very accurate, but it is quite time consuming and it may suffer from a bias generated from the hand-curation itself, which is often guided by the a… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(12 citation statements)
references
References 35 publications
(40 reference statements)
0
12
0
Order By: Relevance
“…The output of the method is a set of metaclusters . In [ 16 ] and in Results, we provide evidence that sequences belonging to the same metacluster are likely to share a core evolutionary module and thus metaclusters can often be assimilated to protein families. DPCfam output can be used ‘as is’ or, alternatively, metaclusters can be the basis from which to build profile-HMMs for more sensitive searches of the sequence space [ 19 ].…”
Section: Methodsmentioning
confidence: 91%
See 4 more Smart Citations
“…The output of the method is a set of metaclusters . In [ 16 ] and in Results, we provide evidence that sequences belonging to the same metacluster are likely to share a core evolutionary module and thus metaclusters can often be assimilated to protein families. DPCfam output can be used ‘as is’ or, alternatively, metaclusters can be the basis from which to build profile-HMMs for more sensitive searches of the sequence space [ 19 ].…”
Section: Methodsmentioning
confidence: 91%
“…Clustering of local alignments means that DPCfam can, in principle, identify families representing individual evolutionary modules or domains. The same algorithm was already described in [ 16 ], where it was applied to two relatively small Pfam clans (4,083 and 2,022 sequences, respectively). In this work, we utilize DPCfam to cluster the entire UniRef50 database (v. 2017_07), containing about 23 million sequences that share no more than 50% sequence identity between each other.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations