WITHDRAWN: Comparative Research on Active Learning of Big Aata based on Mapreduce and Spark

Zhang, Ruihong; Hu, Zhihua

doi:10.1016/j.micpro.2020.103425

Cited by 4 publications

(3 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this paper, we propose a new hybrid method for generating a data warehouse that can meet the decision-making needs of decision-makers. In the first phase, the new method exploits the speed of the spark framework-which is much faster than MapReduce according to several research works [17]- [26] in analyzing large amounts of unstructured and distributed data, to generate a general schema for each collection. This allows to extract the structure of a large amount of data in a reasonable time, thus revealing the richness of the data stored in a document-oriented database.…”

Section: Methodsmentioning

confidence: 99%

Towards a new hybrid approach for building document-oriented data warehouses

Moukhi

Azami

Hajbi

2022

IJECE

View full text Add to dashboard Cite

<span lang="EN-US">Schemaless databases offer a large storage capacity while guaranteeing high performance in data processing. Unlike relational databases, which are rigid and have shown their limitations in managing large amounts of data. However, the absence of a well-defined schema and structure in not only SQL (NoSQL) databases makes the use of data for decision analysis purposes even more complex and difficult. In this paper, we propose an original approach to build a document-oriented data warehouse from unstructured data. The new approach follows a hybrid paradigm that combines data analysis and user requirements analysis. The first data-driven step exploits the fast and distributed processing of the spark engine to generate a general schema for each collection in the database. The second requirement-driven step consists of analyzing the semantics of the decisional requirements expressed in natural language and mapping them to the schemas of the collections. At the end of the process, a decisional schema is generated in JavaScript object notation (JSON) format and the data loading with the necessary transformations is performed.</span>

show abstract

Section: Methodsmentioning

confidence: 99%

Towards a new hybrid approach for building document-oriented data warehouses

Moukhi

Azami

Hajbi

2022

IJECE

View full text Add to dashboard Cite

show abstract

“…As the big data technology develops, and the improvement of public policy evaluation technology continues in the public management, the evaluation of public value is still immature, and public value cannot be fully expressed. Especially for those abstract public values, such as government integrity and procedural justice, although we can know its importance, it is difficult to analyze them through which data [8][9][10]. Even if the relevant data are collected, the authenticity and reliability of the data cannot be guaranteed, which requires our continuous improvement and development in practice.…”

Section: Introductionmentioning

confidence: 99%

Research on Public Management Application Innovation Based on Spark Big Data Framework

Zhi

2022

Mathematical Problems in Engineering

View full text Add to dashboard Cite

Public management service is the key to urban intelligent construction. This paper proposes an analysis method and model based on Spark big data framework and takes resident income, happiness index, urban planning, and ecological environment as the indicators of Spark big data. From the high difficulty of Spark big data cluster analysis of urban public management, we build the index weight by the entropy weight method, optimize the similarity calculation, and achieve the rapid understanding of urban public management. Subsequently, the Spark big data public management platform is applied to the public management of Beijing. The results indicate that the public management platform based on Spark big data framework can improve the public management level of the city and help to build an intelligent city.

show abstract

“…The dependency between the requests was not synchronized by the traditional mapreduce based scheduling approaches which reduced the efficiency of scheduling. Hadoop was an open source implementation, which was used for map reduction for application development and processing for computing distribution [7] But there is a several important issues remains unsolved [8]. These limitations were overcome by executing spark based computing in which the inter dependencies between the requests were analyzed and optimized to achieve effective query scheduling and has the ability to process huge data faster [9].…”

Section: Introductionmentioning

confidence: 99%

SparkGrid: Blockchain Assisted Secure Query Scheduling and Dynamic Risk Assessment for Live Migration of Services in Apache Spark based Grid Environment

G.M.

Nalini

2022

Preprint

View full text Add to dashboard Cite

Grid computing is an emerging technology that enabled the heterogeneous collection of data and provisioning of services to the users. Due to the high amount of incoming heterogeneous request, grid computing needs an efficient scheduling to reduce execution time and satisfy Service Level Agreement (SLA) and Quality of Service (QoS) requirements. For that purpose, we proposed SprakGrid method to reduce execution time and satisfying SLA, QoS requirements. The proposed work includes four consecutive phases which are explained as follows, in first we perform user authentication in order to ensure the legitimacy of the users using Elliptic Curve based Chaos Theory (ECCT) algorithm which generate secret key and stored it into the blockchain. In second we perform query scheduling for resource discovery using Soft Actor Critic (SAC) algorithm by considering 3P’s parameters which is performed by spark environment that schedules optimal resources based on the service request. In third, we perform risk assessment and request dropping, in which the risk nodes of workers are evaluated by master node. To address the resource wastage by attacker, this research evaluates the risk value in a dynamic manner using Shannon entropy. Based on the risk assessment the requestsare classified into two classes such as normal and malicious. In fourth we perform service live migration, in which the malicious requests are dropped and normal request are migrated from source node to target node using Multi-Constraints based Emperor Penguin Optimization (MC-EPO). Finally, simulation is performed by GridSim and the simulation results demonstrate that the proposed SparkGrid method achieves superior performance compared to other state-of-the art methods.

show abstract

WITHDRAWN: Comparative Research on Active Learning of Big Aata based on Mapreduce and Spark

Cited by 4 publications

References 20 publications

Towards a new hybrid approach for building document-oriented data warehouses

Towards a new hybrid approach for building document-oriented data warehouses

Research on Public Management Application Innovation Based on Spark Big Data Framework

SparkGrid: Blockchain Assisted Secure Query Scheduling and Dynamic Risk Assessment for Live Migration of Services in Apache Spark based Grid Environment

Contact Info

Product

Resources

About