High Performance Hadoop Distributed File System

Elkawkagy, Mohamed; Elbeh, Heba

doi:10.2991/ijndc.k.200515.007

Cited by 15 publications

(7 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These processes are distributed throughout the cluster of computers in parallel [ 50 ]. For further information addressing the fundamentals of Hadoop MapReduce, see [ 51 , 52 ].…”

Section: Proposed Malicious Url Detection Modelmentioning

confidence: 99%

Building an Effective Classifier for Phishing Web Pages Detection: A Quantum-Inspired Biomimetic Paradigm Suitable for Big Data Analytics of Cyber Attacks

Darwish,

Farhan,

Elzoghabi

2023

Biomimetics

View full text Add to dashboard Cite

To combat malicious domains, which serve as a key platform for a wide range of attacks, domain name service (DNS) data provide rich traces of Internet activities and are a powerful resource. This paper presents new research that proposes a model for finding malicious domains by passively analyzing DNS data. The proposed model builds a real-time, accurate, middleweight, and fast classifier by combining a genetic algorithm for selecting DNS data features with a two-step quantum ant colony optimization (QABC) algorithm for classification. The modified two-step QABC classifier uses K-means instead of random initialization to place food sources. In order to overcome ABCs poor exploitation abilities and its convergence speed, this paper utilizes the metaheuristic QABC algorithm for global optimization problems inspired by quantum physics concepts. The use of the Hadoop framework and a hybrid machine learning approach (K-mean and QABC) to deal with the large size of uniform resource locator (URL) data is one of the main contributions of this paper. The major point is that blacklists, heavyweight classifiers (those that use more features), and lightweight classifiers (those that use fewer features and consume the features from the browser) may all be improved with the use of the suggested machine learning method. The results showed that the suggested model could work with more than 96.6% accuracy for more than 10 million query–answer pairs.

show abstract

“…These processes are distributed throughout the cluster of computers in parallel [ 50 ]. For further information addressing the fundamentals of Hadoop MapReduce, see [ 51 , 52 ].…”

Section: Proposed Malicious Url Detection Modelmentioning

confidence: 99%

Building an Effective Classifier for Phishing Web Pages Detection: A Quantum-Inspired Biomimetic Paradigm Suitable for Big Data Analytics of Cyber Attacks

Darwish,

Farhan,

Elzoghabi

2023

Biomimetics

View full text Add to dashboard Cite

show abstract

“…Apache Hadoop is an HPDC that supports HTTP REST APIs to interact with the HDFS [22], [23], so after the system determines the new clusters of the files, it will call the HDFS API endpoint to apply the new replication policy for each file.…”

Section: E Applying New Replication Policiesmentioning

confidence: 99%

Dynamic Replication Policy on HDFS Based on Machine Learning Clustering

et al. 2023

View full text Add to dashboard Cite

Data growth in recent years has been swift, leading to the emergence of big data science. Distributed File Systems (DFS) are commonly used to handle big data, like Google File System (GFS), Hadoop Distributed File System (HDFS), and others. The DFS should provide the availability of data and reliability of the system in case of failure. The DFS replicates the files in different locations to provide availability and reliability. These replications consume storage space and other resources. The importance of these files differs depending on how frequently they are used in the system. So some of these files do not deserve to replicate many times because it is unimportant in the system. This paper introduces a Dynamic Replication Policy using Machine Learning Clustering (DRPMLC) on HDFS, which uses Machine Learning to cluster the files into different groups and apply other replication policies to each group to reduce the storage consumption, improve the read and write operations time and keep the availability and reliability of HDFS as a High-Performance Distributed Computing (HPDC).

show abstract

“…Following the classification, the credit cards with valid records are parsed to recognize the number of credit cards. It is then saved to the distributed file system, which is a Hadoop distributed file system (HDFS) in this implementation [25,26]. One point to be noted concerning this implementation is the feature of user notifications.…”

Section: Fog Computing Performance Evaluationmentioning

confidence: 99%

“…There are ideal tools for batch processing and maintenance layers, such as Hadoop and Impala. Hadoop is relevant because it can process and store petabytes of data, while Impala, in turn, interactively requests such data [25]. The real-time requirements for the batch and service layers are not met, since MapReduce itself has a significant delay, so hours of time can be required before the presentation data are propagated to the serving layer.…”

mentioning

confidence: 99%

Fog Computing Capabilities for Big Data Provisioning: Visualization Scenario

et al. 2022

View full text Add to dashboard Cite

With the development of Internet technologies, huge amounts of data are collected from various sources, and used ‘anytime, anywhere’ to enrich and change the life of the whole of society, attract ways to do business, and better perceive people’s lives. Those datasets, called ‘big data’, need to be processed, stored, or retrieved, and special tools were developed to analyze this big data. At the same time, the ever-increasing development of the Internet of Things (IoT) requires IoT devices to be mobile, with adequate data processing performance. The new fog computing paradigm makes computing resources more accessible, and provides a flexible environment that will be widely used in next-generation networks, vehicles, etc., demonstrating enhanced capabilities and optimizing resources. This paper is devoted to analyzing fog computing capabilities for big data provisioning, while considering this technology’s different architectural and functional aspects. The analysis includes exploring the protocols suitable for fog computing by implementing an experimental fog computing network and assessing its capabilities for providing big data, originating from both a real-time stream and batch data, with appropriate visualization of big data processing.

show abstract

High Performance Hadoop Distributed File System

Cited by 15 publications

References 10 publications

Building an Effective Classifier for Phishing Web Pages Detection: A Quantum-Inspired Biomimetic Paradigm Suitable for Big Data Analytics of Cyber Attacks

Building an Effective Classifier for Phishing Web Pages Detection: A Quantum-Inspired Biomimetic Paradigm Suitable for Big Data Analytics of Cyber Attacks

Dynamic Replication Policy on HDFS Based on Machine Learning Clustering

Fog Computing Capabilities for Big Data Provisioning: Visualization Scenario

Contact Info

Product

Resources

About