2013
DOI: 10.1021/ci400099q
|View full text |Cite
|
Sign up to set email alerts
|

Estimating Error Rates in Bioactivity Databases

Abstract: Bioactivity databases are routinely used in drug discovery to look-up and, using prediction tools, to predict potential targets for small molecules. These databases are typically manually curated from patents and scientific articles. Apart from errors in the source document, the human factor can cause errors during the extraction process. These errors can lead to wrong decisions in the early drug discovery process. In the current work, we have compared bioactivity data from three large databases (ChEMBL, Licep… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
52
0

Year Published

2013
2013
2019
2019

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 59 publications
(56 citation statements)
references
References 13 publications
(19 reference statements)
1
52
0
Order By: Relevance
“…In parallel to the concern about the evaluation of individual bioactivity predictions, recent publications have aimed at establishing the level of uncertainty in public bioactivity databases [22-25]. In this vein, Brown et al [26] highlighted the importance of including the uncertainty of bioactivity data into the evaluation of models quality.…”
Section: Introductionmentioning
confidence: 99%
“…In parallel to the concern about the evaluation of individual bioactivity predictions, recent publications have aimed at establishing the level of uncertainty in public bioactivity databases [22-25]. In this vein, Brown et al [26] highlighted the importance of including the uncertainty of bioactivity data into the evaluation of models quality.…”
Section: Introductionmentioning
confidence: 99%
“…The aforementioned values are the result of machine-based harmonization and consolidation of multiple data objects in chemical, bioactivity and CCP space. An independent study by Tiikkainen and Franke (19), comparing ChEMBL (release 14) and WOMBAT 2012.01, showed >394 000 unique bioactivities in WOMBAT, compared with nearly 3.3 million bioactivities in ChEMBL; and 2755 unique targets in ChEMBL, compared with 1486 unique targets in WOMBAT. The harmonization trends suggest that a consolidated database is preferable to a federated collection, at least in this case, when seeking to evaluate global bioactivity trends.…”
Section: Discussionmentioning
confidence: 99%
“…The harmonization trends suggest that a consolidated database is preferable to a federated collection, at least in this case, when seeking to evaluate global bioactivity trends. This solution was, for example, implemented in the ‘Merz Virtual Bioactivity Database’, which integrates ChEMBL and WOMBAT, among other data sources (8, 19). …”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…Several studies notified that the bioactivity databases should be used with caution since errors in the data they provided could arise 62,86 (Table 2). A manual curation of all data is then required before inclusion to limit the integration of errors.…”
Section: Selection Of Active Compoundsmentioning
confidence: 99%