2010 14th Panhellenic Conference on Informatics 2010
DOI: 10.1109/pci.2010.47
|View full text |Cite
|
Sign up to set email alerts
|

Parallel Collection of Live Data Using Hadoop

Abstract: Hadoop is a fault tolerant Java framework that supports data distribution and process parallelization using commodity hardware. Based on the provided scalability and the independence of task execution, we combined Hadoop with crawling techniques to implement various applications that deal with large amount of data. Our experiments show that Hadoop is a very useful and trustworthy tool for creating distributed programs that perform better in terms of computational efficiency.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2012
2012
2021
2021

Publication Types

Select...
7

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(6 citation statements)
references
References 6 publications
0
6
0
Order By: Relevance
“…The authors in previous study used Hadoop for collecting live large amount of data. They explained how combining Hadoop with crawling programs could improve the efficiency of Big Data computations.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The authors in previous study used Hadoop for collecting live large amount of data. They explained how combining Hadoop with crawling programs could improve the efficiency of Big Data computations.…”
Section: Related Workmentioning
confidence: 99%
“…The authors in previous study 11 with the number of machines in the cluster. The peculiarity of Hadoop is that it can handle different types of data stored in any kind of infrastructure.…”
Section: Related Workmentioning
confidence: 99%
“…People can make full use of the cluster of high-speed computing and powerful storage capacity, and they don't need to know the underlying details of the distributed framework. It is obviously that Hadoop provides a solution to the problem of massive data storage and processing [3] . Figure 2.…”
Section: Data Analysis Based On Micro Service Frameworkmentioning
confidence: 99%
“…It hides the "messy" details of parallelization, allowing even inexperienced programmers to easily utilize the resources of a large distributed system. Although it is written in Java, Hadoop streaming allows its implementation using any programming language [10]. HDFS is highly fault-tolerant distributed file system designed to run on low-cost commodity hardware.…”
Section: Hadoop Distributed File System (Hdfs)mentioning
confidence: 99%