2017
DOI: 10.1016/j.future.2016.04.023
|View full text |Cite
|
Sign up to set email alerts
|

Scaling machine learning for target prediction in drug discovery using Apache Spark

Abstract: We have used Spark to automatically distribute C++ predictors over a cluster. Our Spark application allows near-linear speedup and optimal cluster utilization. The core of the algorithm is easily changed to allow for experimentation. AbstractIn the context of drug discovery, a key problem is the identification of candidate molecules that affect proteins associated with diseases. Inside Janssen Pharmaceutica, the Chemogenomics project aims to derive new candidates from existing experiments through a set of mach… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
10
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 43 publications
(10 citation statements)
references
References 19 publications
0
10
0
Order By: Relevance
“…These models are generated by learning from already tested chemical substances; the results showed the effectiveness of Spark to create pipelines based on machine learning techniques with a good scaling behavior in a distributed environment. Dries Harnie and his teamwork [18] re-implemented the Chemogenomics pipeline using Apache Spark (S-CHEMO). The Chemogenomics project attempts to derive new candidate drugs from existing experiments through a set of machine learning predictor programs.…”
Section: Related Workmentioning
confidence: 99%
“…These models are generated by learning from already tested chemical substances; the results showed the effectiveness of Spark to create pipelines based on machine learning techniques with a good scaling behavior in a distributed environment. Dries Harnie and his teamwork [18] re-implemented the Chemogenomics pipeline using Apache Spark (S-CHEMO). The Chemogenomics project attempts to derive new candidate drugs from existing experiments through a set of machine learning predictor programs.…”
Section: Related Workmentioning
confidence: 99%
“…If the service provider is unaware of a customer who is about to churn, no action can be taken for that customer. Business must consider risk, the level, and cost of intervention and plausible customer segmentation [6]. In telecommunication industry, subscribers frequently switch over from one industry to another which is a prime concern.…”
Section: Causes Of Customer Churnmentioning
confidence: 99%
“…During this pneumonia epidemic, studies have utilized the favored approaches that target SARS-CoV-2 with high-throughput screening of large-scale molecular databases and obtaining potential antiviral drugs [20][21]. With the advancement of computer technology, the combination of computer-aided drug design (CADD) and arti cial intelligence (AI) research has become a valuable tool to accelerate the slow process of drug discovery and restraint the expansion of R&D costs, expand the applicable system and improve the level of automation, followed by the development of CADD-based multithreaded in silico screening technology [22][23][24][25]. Within the framework of above idea, we proposed a multimodule integrated approach aimed at improving the lead compound screening accuracy and greatly reducing the time cost by fully maximizing the advantages of each module to achieve a semiautomatic pipeline.…”
Section: Introductionmentioning
confidence: 99%