On using MapReduce to scale algorithms for Big Data analytics: a case study

Kijsanayothin, Phongphun; Chalumporn, Gantaphon; Hewett, Rattikorn

doi:10.1186/s40537-019-0269-1

Cited by 9 publications

(3 citation statements)

References 35 publications

(50 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As discussed in [58], PrefixSpan outperforms other Aprori-like algorithms and can be extended to mining sequential patterns with user-specified constraints for various domain applications. In [59], Kijsanayothin et al stated that MapReduce is a programming paradigm that enables parallel and distributed execution of massive data processing on { "RULES": [ { "LHS": ["(A)", "(D)"], "RHS": ["(C)"], "sup": 0.55, "conf": 0.75 }, { "LHS": ["(B,C)", "(A,C,E)", "(D)"], "RHS": ["(A,E)"], "sup": 0.60, "conf": 0.9 }, … ] } large clusters of machines, and thus researchers can focus on building efficient algorithms to enhance performance.…”

Section: B Algorithmsmentioning

confidence: 99%

A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

2022

View full text Add to dashboard Cite

With the ubiquity of sensor networks and smart devices that continuously collect data, we face the challenge of analyzing the growing stream of data in real time. In recent years, there has been a huge need to gain useful knowledge by incrementally analyzing event sequence data. Although episode pattern mining techniques have existed for years, people have recently become more aware of their practical value in solving real-life domain problems such as manufacturing records, stock markets, and weather forecasts. The effective and efficient application of episode pattern mining techniques to analyze complex event data is becoming increasingly important for solving real-life problems in wide domains. However, few studies have focused on developing a scalable framework based on episode pattern mining of complex event sequences for applications in various domains. In this work, we propose a novel framework named SAAF (Scalable Analytical Application Framework) based on complex event episode mining techniques, including batch episode mining, delta episode mining, incremental episode mining, and pattern merging, to consider both efficiency and accuracy. Moreover, to enhance scalability, we adopt the lambda architecture with Apache Spark and Apache Spark Streaming as the system development framework. Finally, the experimental results on three real datasets of different domains and two benchmark datasets showed that the proposed SAAF framework exhibits excellent performance in terms of efficiency, accuracy, and scalability.

show abstract

Section: B Algorithmsmentioning

confidence: 99%

A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

2022

View full text Add to dashboard Cite

show abstract

“…The MapReduce [6] programming paradigm is one of the most representative BDA. MapReduce provides horizontal scaling to petabytes of data on thousands of compute nodes, a simplified programming model, and a high degree of reliability when failed nodes occur [7]. In MapReduce, the input data are divided into many parts.…”

Section: Introductionmentioning

confidence: 99%

QoSComm: A Data Flow Allocation Strategy among SDN-Based Data Centers for IoT Big Data Analytics

et al. 2020

View full text Add to dashboard Cite

When Internet of Things (IoT) big data analytics (BDA) require to transfer data streams among software defined network (SDN)-based distributed data centers, the data flow forwarding in the communication network is typically done by an SDN controller using a traditional shortest path algorithm or just considering bandwidth requirements by the applications. In BDA, this scheme could affect their performance resulting in a longer job completion time because additional metrics were not considered, such as end-to-end delay, jitter, and packet loss rate in the data transfer path. These metrics are quality of service (QoS) parameters in the communication network. This research proposes a solution called QoSComm, an SDN strategy to allocate QoS-based data flows for BDA running across distributed data centers to minimize their job completion time. QoSComm operates in two phases: (i) based on the current communication network conditions, it calculates the feasible paths for each data center using a multi-objective optimization method; (ii) it distributes the resultant paths among data centers configuring their openflow Switches (OFS) dynamically. Simulation results show that QoSComm can improve BDA job completion time by an average of 18%.

show abstract

“…Google(Kijsanayothin et al, 2019). It is mainly developed and implemented using a functional programming model.…”

mentioning

confidence: 99%

Big data analytics framework for childhood infectious disease surveillance system using modified mapreduce algorithm: a case study of Tanzania

Mwamnyange¹

View full text Add to dashboard Cite

Tanzania has been affected with a potential emerging and re-emerging of infectious diseases such as diarrhea, acute respiratory infections, pneumonia, hepatitis, and measles. There is an increasing trend for the occurrences of new emerging pandemic diseases such as the coronavirus (Covid-19) in 2020 as well as re-occurrence of old infectious diseases such as cholera epidemic in 2015-2017, chikungunya and dengue fever outbreak in 2010, 2012, 2014, 2018, and 2019 which affected different regions in Tanzania. These diseases by far are the main causes of the high mortality rate for women and children of 0-5 years of age. The traditional disease surveillance system as the foundation of the public healthcare practices has been facing challenges in data collection and analysis using health big data sources to prevent and control infectious diseases. Health big data sources on infectious diseases have been recognized as the potential supplement for the provision of evidence-based decision-making worldwide. Tanzania as one of the resource-limited setting countries has lagged because of the challenges in information technology infrastructure and public healthcare resources. The traditional disease surveillance system is still paper-based, semi-automated, and limited in scope which relies on clinical-oriented patient data sources and leaving out nontraditional and pre-diagnostic unstructured big data sources. This research study aimed to improve the traditional infectious disease surveillance system to employ big data analytics technology in healthcare data collection and analysis to improve decision-making. Big data analytics framework for the childhood infectious disease surveillance system was developed which guides healthcare professionals to streamline the collection and analysis of health big data for infectious disease surveillance. The framework was then fairly compared with the existing framework in its performance using infrastructures, data size and transformation, and running-time execution of the systems. The experimental results indicate the efficiency of the framework system performance with the highest running time execution of about 56% quicker over the traditional system. Also, it has the best performance in processing multiple data structures using additional processing units. In particular, the proposed framework can be adopted to improve the prenatal and postnatal healthcare system in Tanzania.

show abstract

On using MapReduce to scale algorithms for Big Data analytics: a case study

Cited by 9 publications

References 35 publications

A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

A Scalable Analytical Framework for Complex Event Episode Mining With Various Domains Applications

QoSComm: A Data Flow Allocation Strategy among SDN-Based Data Centers for IoT Big Data Analytics

Big data analytics framework for childhood infectious disease surveillance system using modified mapreduce algorithm: a case study of Tanzania

Contact Info

Product

Resources

About