2013
DOI: 10.48550/arxiv.1303.5367
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Taming the zoo - about algorithms implementation in the ecosystem of Apache Hadoop

Abstract: Content Analysis System (CoAnSys) is a research framework for mining scientific publications using Apache Hadoop. This article describes the algorithms currently implemented in CoAnSys including classification, categorization and citation matching of scientific publications. The size of the input data classifies these algorithms in the range of big data problems, which can be efficiently solved on Hadoop clusters.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2013
2013
2013
2013

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…To do that accurately, though, we need to describe some technicalities first. Even more technical details not covered in this article are discussed in [21].…”
Section: Hadoopisationmentioning
confidence: 99%
“…To do that accurately, though, we need to describe some technicalities first. Even more technical details not covered in this article are discussed in [21].…”
Section: Hadoopisationmentioning
confidence: 99%
“…Workflows implemented in CoAnSys algorithms [17,18] are quite straightforward -no traps, no mazes, very transparent ideas. Crafting them against guidelines mentioned in Section 1.2 had impact not only on practices used, but also on the choice of tools.…”
Section: Development Aspectsmentioning
confidence: 99%