This paper introduces an approach to increase the accuracy rate of classification by employing Bag-of-Words (BoW) as a feature selection method along with machine learning algorithms to obtain a more accurate output. Because of its capability in quickly processing large sets of data and getting accurate results, this approach can be used in medical areas. Different ensemble approaches are generated by different researchers to obtain good results as mentioned in the literature review. In this study a novel algorithm is proposed to analyze medical kidney test reports, using BoW for selecting the features and analyzing them via Boosting four different machine learning classification algorithms like Sequential Minimum Optimization (SMO), k-Nearest Neighbors (k-NN), Random Forests (RF) and Naïve Bayes (NB). With the help of specialists in urology, the proposed algorithm is tested against multiple datasets of different kidney tests. The accuracy of the proposed Boosting algorithms outperforms its counterpart algorithms like SMO, k-NN, RF and NB when they had showen their performances alone.
Big Data is a vast volume of data that is not easy to be stored or processed with conventional approaches within a limited period. Therefore, to manage and extract value from it, a new architecture, method and analysis are needed. Big Data poses many challenges and problems and it has different properties such as volume, velocity, variety and veracity. The goal of Big Data is not only to collect, save and organize huge volumes of data, but it is also used to evaluate, extract and visualize useful information for further processes. Big Data is a modern worldwide novel technology that has the potential to provide great benefits to business and organizations of different fields around the world and it will be more desirable in the next few years. This work describes the importance of Big Data, various challenges it faces in adapting to today's modern era, characteristics and architecture of Big Data, technologies used in Big Data and applications created using Big Data. The paper also explains MapReduce and Hadoop Distributed File System as two important models of Big Data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.