2014
DOI: 10.1002/cpe.3435
|View full text |Cite
|
Sign up to set email alerts
|

Parallelizing the execution of native data mining algorithms for computational biology

Abstract: SUMMARYData mining is being increasingly used in biology. Biologists are adopting prototyping languages, like R and Matlab, to facilitate the application of data mining algorithms to their data. As a result, their scripts are becoming increasingly complex and also require frequent updates. Application to large datasets becomes impractical and the time-to-paper increases. Furthermore, even if there are various systems that can be used to efficiently process large datasets, for example, using Cloud and High Perf… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
22
0

Year Published

2014
2014
2020
2020

Publication Types

Select...
5
1
1

Relationship

5
2

Authors

Journals

citations
Cited by 36 publications
(22 citation statements)
references
References 28 publications
0
22
0
Order By: Relevance
“…On the other hand, also other taxa matching systems could benefit from the same facilities by following integration guidelines (Coro and Italiano, 2012). We give further details about the advantages and the modalities of integrating parallelisable algorithms in iMarine in the paper by Coro et al (2014). Another advantage of using the BiOnym instance on iMarine, is that access policies to authority files are managed by the eInfrastructure agreements with the data providers.…”
Section: The Bionym Approachmentioning
confidence: 98%
“…On the other hand, also other taxa matching systems could benefit from the same facilities by following integration guidelines (Coro and Italiano, 2012). We give further details about the advantages and the modalities of integrating parallelisable algorithms in iMarine in the paper by Coro et al (2014). Another advantage of using the BiOnym instance on iMarine, is that access policies to authority files are managed by the eInfrastructure agreements with the data providers.…”
Section: The Bionym Approachmentioning
confidence: 98%
“…For example, several systems stress on low maintenance, low deployment costs, and high sustainability, granting reliability, easiness of installation, and usability at the same time. On the other hand, aspects like managing different programming languages and importing community‐provided processes are often neglected, although they are crucial to meet the programming habits of the served communities . Overall, there is no system satisfying all the requirements reported in Section 1; nevertheless, examples can be given of flexible systems that may potentially meet them.…”
Section: Overviewmentioning
confidence: 99%
“…It offers a set of off-the-shelf algorithms including clustering algorithms such as DBScan. Moreover, it enables a simple integration and execution of user-defined algorithms expressed in a number of programming and scripting languages including R [32]. It currently embeds more than 100 different algorithms ranging from Anomalies Detection, Classification, Clustering, Simulation, Training, Bayesian Methods, Trends, and many more [33].…”
Section: Statscubementioning
confidence: 99%
“…Virtual Research Environments by Hybrid Data Infrastructures the set of data sources integrated in the D4Science infrastructure; and (iii) the development of new algorithms and approaches aiming at enlarging the offering of the Statistical Manager service. Thanks to the openness of the gCube system, some of these developments can be performed by the community in the large, e.g., every scientists owning an algorithm worth to share can decide to integrate it into the Statistical Manager and benefit from a boost in performances [32].…”
Section: Pos(isgc2014)022mentioning
confidence: 99%