Nowadays with the technology revolution the term of big data is a phenomenon of the decade moreover, it has a significant impact on our applied science trends. Exploring well big data tool is a necessary demand presently. Hadoop is a good big data analyzing technology, but it is slow because the Job result among each phase must be stored before the following phase is started as well as to the replication delays. Apache Spark is another tool that developed and established to be the real model for analyzing big data with its innovative processing framework inside the memory and high-level programming libraries for machine learning, efficient data treating and etc. In this paper, some comparisons are presented about the time performance evaluation among Scala and Java in apache spark MLlib. Many tests have been done in supervised and unsupervised machine learning methods with utilizing big datasets. However, loading the datasets from Hadoop HDFS as well as to the local disk to identify the pros and cons of each manner and discovering perfect reading or loading dataset situation to reach best execution style. The results showed that the performance of Scala about 10% to 20% is better than Java depending on the algorithm type. The aim of the study is to analyze big data with more suitable programming languages and as consequences gaining better performance.
Nowadays, the big data marketplace is rising rapidly. The big challenge is finding a system that can store and handle a huge size of data and then processing that huge data for mining the hidden knowledge. This paper proposed a comprehensive system that is used for improving big data analysis performance. It contains a fast big data processing engine using Apache Spark and a big data storage environment using Apache Hadoop. The system tests about 11 Gigabytes of text data which are collected from multiple sources for sentiment analysis. Three different machine learning (ML) algorithms are used in this system which is already supported by the Spark ML package. The system programs were written in Java and Scala programming languages and the constructed model consists of the classification algorithms as well as the pre-processing steps in a figure of ML pipeline. The proposed system was implemented in both central and distributed data processing. Moreover, some datasets manipulation manners have been applied in the system tests to check which manner provides the best accuracy and time performance. The results showed that the system works efficiently for treating big data, it gains excellent accuracy with fast execution time especially in the distributed data nodes.
Data mining is the process of extracting hidden patterns from data. One of the most important activities in data mining is the association rule mining and the new head for data mining research area is privacy of mining. Privacy preserving data mining is a new research trend in privacy data for data mining and statistical database. Data mining can be applied on centered or distributed databases. Most efficient approaches for mining distributed databases suppose that all of the data at each site can be shared. Privacy concerns may prevent the sites from directly sharing the data, and some types of information about the data. Privacy Preserving Data Mining (PPDM) has become increasingly popular because it allows sharing of privacy sensitive data for analysis purposes.In this paper, the problem of privacy preserving association rule mining in horizontally distributed database is addressed by proposing a system to compute a global frequent itemsets or association rules from different sites without disclosing individual transactions. Indeed, a new algorithm is proposed to hide sensitive frequent itemsets or sensitive association rules from the global frequent itemsets by hiding them from each site individually. This can be done by modifying the original database for each site in order to decrease the support for each sensitive itemset or association rule. Experimental results show that the proposed algorithm hides rules in a distributed system with the good execution time, and with limited side effects. Also, the proposed system has the capability to calculate the global frequent itemsets from different sites and preserves the privacy for each site.
Abstract-SQL injection (SQLI) is a major type of attack that threatens the integrity, confidentiality and authenticity or functionality of any database driven web application. It allows the attacker to gain unauthorized access to the back-end database by exploiting the vulnerabilities within the system in order to commit an attack and access resources. Database Intrusion Detection System (DIDS) is the defense against SQLI that is used as a detection and prevention technique to protect any database driven web application. In this paper a proposed system is presented to protect the web application from SQLI. This proposed system uses a new technique of signature-based detection. It depends on secure hash algorithm (SHA-1), which is used to check the signature for the submitted queries and to decide whether these queries are valid, or not. The proposed system can distinguish and prevent hacking attempts by detecting the attacker, blocking his/her request, and preventing him/her from accessing the web application again. The proposed system was tested using Sqlmapproject attacking tool. Sqlmapproject was used to attack the web application (built using PHP and MySQL server) before and after protection. The results show that the proposed system works correctly and it can protect the web application system with good performance and high efficiency.
Today, the world is using many modern Information Technology (IT) systems to gather, store, and manipulateimportant information. On the other hand, hackers are trying to gain access to any computer or system for viewing, copying, or creating data without the intention of destroying data or maliciously harming the computer. Exploiting domain name system (DNS) vulnerabilities have resulted in a range of high profile disruptions and outages for major internet sites around the world. DNS attack is an exploit in which an attacker takes advantage of vulnerabilities is the (DNS). This paper will present the vulnerabilities and the weak points of the DNS server and how attackers (black hat hakcer) can exploit those vulnerabilities to attack and gain access to the server machine. In conclusion, presenting and implementing this project make users understand the hazard of hackers. Then, will lead to build secure and protected systems and applications.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.