2017
DOI: 10.1093/nar/gkx248
|View full text |Cite
|
Sign up to set email alerts
|

GibbsCluster: unsupervised clustering and alignment of peptide sequences

Abstract: Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align pe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

7
207
0

Year Published

2017
2017
2024
2024

Publication Types

Select...
9

Relationship

3
6

Authors

Journals

citations
Cited by 193 publications
(219 citation statements)
references
References 23 publications
7
207
0
Order By: Relevance
“…The GibbsCluster analysis detected the correct number of specificities for the TC‐1 set and A18‐Tpm, while it underestimated the number of alleles in the Jurkat data set (Figure S1, Supporting Information). As has been previously reported, unsupervised clustering tends to underestimate the number of specificities when multiple MHCs have redundant motifs and/or low expression levels . In the case of the Jurkat cell line, the expressed HLA‐B alleles B*07:02 and B*35:03 belong to the B07 supertype and are known to have nearly identical peptide‐binding preferences; unsupervised clustering, therefore, fails to separate them into two separate specificities.…”
Section: Resultsmentioning
confidence: 95%
See 1 more Smart Citation
“…The GibbsCluster analysis detected the correct number of specificities for the TC‐1 set and A18‐Tpm, while it underestimated the number of alleles in the Jurkat data set (Figure S1, Supporting Information). As has been previously reported, unsupervised clustering tends to underestimate the number of specificities when multiple MHCs have redundant motifs and/or low expression levels . In the case of the Jurkat cell line, the expressed HLA‐B alleles B*07:02 and B*35:03 belong to the B07 supertype and are known to have nearly identical peptide‐binding preferences; unsupervised clustering, therefore, fails to separate them into two separate specificities.…”
Section: Resultsmentioning
confidence: 95%
“…As has been previously reported, unsupervised clustering tends to underestimate the number of specificities when multiple MHCs have redundant motifs and/or low expression levels. [16,22] In the case of the Jurkat cell line, the expressed HLA-B alleles B*07:02 and B*35:03 belong to the B07 supertype and are known to have nearly identical peptide-binding preferences [23,24] ; unsupervised clustering, therefore, fails to separate them into two separate specificities. Finally, the fraction of peptides captured by the "trash cluster" varied between 2.6% (A18-Tpm) and 7.9% (TC-1), suggesting a level of variability in the quality of the different data sets.…”
Section: Rescoring Peptidomes From Human Mouse and Cattlementioning
confidence: 99%
“…The only critical limitation for such data integrations is the criteria that each data point must be associated with a specific MHC element. This information is not always readily available, but can in most cases be inferred by unsupervised clustering of the available data (using GibbsCluster (29), position weight matrix mixture models (16), or similar approaches), and association of each cluster to an MHC molecule of the given host.…”
Section: Discussionmentioning
confidence: 99%
“…For example, Hammock 23 was developed to identify consensus motifs in large data sets, and PepServe 24 to cluster peptides based on their physiochemical properties. Another application, UCLUST, 25 performs rapid clustering and implements several previously identified clustering algorithms, and Andreatta et al 26,27 developed a tool based on Gibbs sampling. Overall, most of these tools cluster based on the physiochemical properties of peptide sequences or by defining and focusing on shared motifs.…”
Section: Introductionmentioning
confidence: 99%