2019
DOI: 10.1093/bioinformatics/btz160
|View full text |Cite
|
Sign up to set email alerts
|

Interoperable and scalable data analysis with microservices: applications in metabolomics

Abstract: Motivation Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed using the Kubernetes container orchestrator. Results … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
25
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
2
2

Relationship

4
4

Authors

Journals

citations
Cited by 26 publications
(27 citation statements)
references
References 53 publications
1
25
0
Order By: Relevance
“…Apart from the size, each cluster had the same topology: one master node (configured to act as edge), and a 5-to-3 ratio between service nodes and storage nodes. This service-to-storage ratio was shown to provide good performance, in terms of distributed data processing, in our previous study 59 . Hence, we started with a cluster setup that included 1 master node, 5 service nodes and 3 storage nodes (8 nodes in total, excluding master) and, by doubling size on each run, we scaled up to 1 master node, 40 service nodes and 24 storage nodes (64 nodes in total, excluding master).…”
Section: Deployment Automation Scalabilitymentioning
confidence: 78%
See 1 more Smart Citation
“…Apart from the size, each cluster had the same topology: one master node (configured to act as edge), and a 5-to-3 ratio between service nodes and storage nodes. This service-to-storage ratio was shown to provide good performance, in terms of distributed data processing, in our previous study 59 . Hence, we started with a cluster setup that included 1 master node, 5 service nodes and 3 storage nodes (8 nodes in total, excluding master) and, by doubling size on each run, we scaled up to 1 master node, 40 service nodes and 24 storage nodes (64 nodes in total, excluding master).…”
Section: Deployment Automation Scalabilitymentioning
confidence: 78%
“…Khoonsari et al 59 used the PhenoMeNal VRE to scale the preprocessing pipeline of MTBLS233, one of the largest metabolomics studies available on the Metabolights repository 68 . This is substantially different from the previous benchmarks, as the analysis was composed by several tools chained into a single pipeline, and because the scalability was evaluated over the full workflow.…”
Section: Full Analysis Scalingmentioning
confidence: 99%
“…We implemented a computational workflow to process LC-MS data, illustrated in Figure 3, and evaluated how well it can scale on a Kubernetes infrastructure. The workflow has been described thoroughly elsewhere by Khoonsari et al [12]. Briefly, the open source mzML files were first centroided and calibrated using OpenMS [23].…”
Section: Resultsmentioning
confidence: 99%
“…Thanks to containerisation, scientists can package pipelines in an isolated and self-contained manner, to be distributed and run across a wide variety of computing platforms. Examples of projects in which microservices are a cornerstone include the PhenoMeNal project [12] and the EXTraS project [13].…”
Section: Introductionmentioning
confidence: 99%
“…In PhenoMeNal, we have extended Galaxy, Jupyter, Luigi and Pachyderm in such a way that they can be orchestrated throughout the cloud infrastructure together with the data analysis tools themselves [69]. Six important metabolomics workflows have been fully integrated into PhenoMeNal ( Table 2) and more (mzQuality, NMR-BATMAN) are available for testing ( Fig.…”
Section: Scientific Workflowsmentioning
confidence: 99%