2006
DOI: 10.1007/s11030-006-9041-5
|View full text |Cite
|
Sign up to set email alerts
|

Cheminformatics analysis and learning in a data pipelining environment

Abstract: Workflow technology is being increasingly applied in discovery information to organize and analyze data. SciTegic's Pipeline Pilot is a chemically intelligent implementation of a workflow technology known as data pipelining. It allows scientists to construct and execute workflows using components that encapsulate many cheminformatics based algorithms. In this paper we review SciTegic's methodology for molecular fingerprints, molecular similarity, molecular clustering, maximal common subgraph search and Bayesia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
188
0
2

Year Published

2009
2009
2018
2018

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 179 publications
(192 citation statements)
references
References 40 publications
0
188
0
2
Order By: Relevance
“…Experiments were also conducted using the NBC in the Pipeline Pilot software [19][20][21] ; however the results obtained were comparable to those with R4 and hence only the latter sets of results are discussed here.…”
Section: Experimental Details and Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…Experiments were also conducted using the NBC in the Pipeline Pilot software [19][20][21] ; however the results obtained were comparable to those with R4 and hence only the latter sets of results are discussed here.…”
Section: Experimental Details and Resultsmentioning
confidence: 99%
“…An operational example of the use of substructural analysis is the PASS (for Prediction of Activity Spectra for Substances) system developed by the Poroikov group 11,16,17 . Some of the weighting schemes that have been used in substructural analysis are closely related to those obtained using a naive Bayesian classifier 18 (hereafter NBC), a well-established approach to machine learning that has become popular in chemoinformatics with the availability of the Bayesian modelling routine in the Pipeline Pilot software system [19][20][21] .…”
Section: Introductionmentioning
confidence: 99%
“…With MiniBatch-Kmeans and RDKit fingerprint, the clustering step for ChEMBL-the largest dataset-took <5 h to run on a machine with 16 Gb memory (i3-2100 CPU @ 3.10 GHz). In comparison, the maximum dissimilarity method (Hassan et al, 1996) implemented in PipelinePilot (Hassan et al, 2006) takes more than a week to cluster ChEMBL, and an algorithm with average runtime complexity O(N 3 log N) such as DBScan still takes more than 3 days.…”
Section: Clustering Of Molecules For Very Large Datasetsmentioning
confidence: 99%
“…These fingerprints can be generated using software such as Scitegic's Pipeline Pilot to produce ECFP4 and ECFC4 fingerprints that correspond to each of the variant respectively [9]. The value 4 denotes the number of bonds taken into consideration by the software to generate the fingerprint by encoding the bond radius encircling an atom [9], where in this case there are four bonds. The third variant of the representation i.e SRECFC4 is weighted by the square root of the fragment occurrence fingerprint.…”
Section: Basic Components Of Ss and Tssmentioning
confidence: 99%