Abstract-Some top data mining algorithms, as ensemble classifiers, may be inefficient to very large data set. This paper makes an initial proposal of a distributed ensemble classifier algorithm based on the popular Random Forests for Big Data. The proposed algorithm aims to improve the efficiency of the algorithm by a distributed processing model called MapReduce. At the same time, our proposed algorithm aims to reduce the randomness impact by following an algorithm called Stochastic Aware Random Forests -SARF.
This paper proposes a method to predict word grammatical classes using automatically generated discrete-time Markov chains to model typical sentences. Such method advantage relies on the availability of input resources needed to build an efficient and effective solution to virtually any language, dialect, or domain lingo. One of the main advantages of the proposed method is its simplicity when compared to other sophisticated approaches based on Hidden Markov Models or even more complex formalisms. The proposed method is instantiated to an example and we show that the achieved efficiency and effectiveness bring advantages to traditional similar solutions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.