2009
DOI: 10.1093/bioinformatics/btp589
|View full text |Cite
|
Sign up to set email alerts
|

A novel method for mining highly imbalanced high-throughput screening data in PubChem

Abstract: Motivation: The comprehensive information of small molecules and their biological activities in PubChem brings great opportunities for academic researchers. However, mining high-throughput screening (HTS) assay data remains a great challenge given the very large data volume and the highly imbalanced nature with only small number of active compounds compared to inactive compounds. Therefore, there is currently a need for better strategies to work with HTS assay data. Moreover, as luciferase-based HTS technology… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
63
0

Year Published

2013
2013
2023
2023

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 56 publications
(65 citation statements)
references
References 32 publications
2
63
0
Order By: Relevance
“…Class imbalance occurs frequently in QSAR and drug discovery datasets 14,[65][66][67] . This could be for a number of reasons; however in this context it is due to lack of publically available data for the minority class, poorly-moderately absorbed compounds, in the literature.…”
Section: Resultsmentioning
confidence: 99%
“…Class imbalance occurs frequently in QSAR and drug discovery datasets 14,[65][66][67] . This could be for a number of reasons; however in this context it is due to lack of publically available data for the minority class, poorly-moderately absorbed compounds, in the literature.…”
Section: Resultsmentioning
confidence: 99%
“…Experiments done on different methods conclude with ambiguous results: while Anand et al [63], certified by Li et al [15] opt for sampling methods as optimal solution, we observe on the other front McCarthy et al [65]in agreement with Liu et al [88] on the superiority of the cost sensitive learning; while Quinlan [94] and Thomas [100] are approving ensemble learning methods; on the other hand Cieslak [38] and Marcellin [90] defend the algorithm modification approaches.…”
Section: Discussionmentioning
confidence: 99%
“…Additionally, one can use the Pubchem Bioassay search interface to query Pubchem database for relevant biological assay data via PUG REST and SOAP interfaces. 47, 49,50,51,52 A simple search box is presented, the result of a successful search is a list of assays with AID identifiers, short descriptions, and the number of compounds tested for each assay. The AID identifier is a clickable hyperlink which leads to the corresponding assay web page at Pubchem where a user can find more complete information about the assay.…”
Section: Framework and Discussionmentioning
confidence: 99%