2021
DOI: 10.1101/2021.06.24.449764
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Searching sequence databases for functional homologs using profile HMMs: how to set bit score thresholds?

Abstract: UniProt and BFD databases together have 2.5 billion protein sequences. A large majority of these proteins have been electronically annotated. Automated annotation pipelines, vis-à-vis manual curation, have the advantage of scale and speed but are fraught with relatively higher error rates. This is because sequence homology does not necessarily translate to functional homology, molecular function specification is hierarchic and not all functional families have the same amount of experimental data that one can e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 53 publications
0
4
0
Order By: Relevance
“…Bit score thresholds for NSAT HMM and NSD HMM , profile HMMs of aminotransferases and dehydratases, respectively, were set based on Receiver Operator Characteristic (ROC) curves generated using hits from TrEMBL under the assumption that annotations provided in TrEMBL are correct. The procedure used to generate ROC curves is described in detail elsewhere 6,44 . Briefly, ROC curves were generated by calculating the number of true positives, false positives and false negatives for various values of bit score threshold (Figure S3).…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Bit score thresholds for NSAT HMM and NSD HMM , profile HMMs of aminotransferases and dehydratases, respectively, were set based on Receiver Operator Characteristic (ROC) curves generated using hits from TrEMBL under the assumption that annotations provided in TrEMBL are correct. The procedure used to generate ROC curves is described in detail elsewhere 6,44 . Briefly, ROC curves were generated by calculating the number of true positives, false positives and false negatives for various values of bit score threshold (Figure S3).…”
Section: Methodsmentioning
confidence: 99%
“…The procedure used to generate ROC curves is described in detail elsewhere. 6,44 profiles was obtained and on the basis of this plot, thresholds for both C3_NSAT HMM and C4_NSAT HMM were set to 400 bits (Figure 4A).…”
Section: Datasets and Profile Hmmsmentioning
confidence: 99%
“…TrEMBL under the assumption that annotations provided in TrEMBL are correct. The procedure used to generate ROC curves is described in detail elsewhere [6], [35]. Briefly, ROC curves were generated by calculating the number of true positives, false positives and false negatives for various values of bit score threshold (Figure S3).…”
Section: Methodsmentioning
confidence: 99%
“…Bit score thresholds for NSATHMM and NSDHMM, profile HMMs of aminotransferases and dehydratases, respectively, were set based on Receiver Operator Characteristic (ROC) curves generated using hits from TrEMBL under the assumption that annotations provided in TrEMBL are correct. The procedure used to generate ROC curves is described in detail elsewhere 6,34 . Briefly, ROC curves were generated by calculating the number of true positives, false positives and false negatives for various values of bit score threshold (Figure S3).…”
Section: Methodsmentioning
confidence: 99%