In this paper we investigate the effectiveness of ensemble-based learners for web robot session identification from web server logs. We also perform multi fold robot session labeling to improve the performance of learner. We conduct a comparative study for various ensemble methods (Bagging, Boosting, and Voting) with simple classifiers in perspective of classification. We also evaluate the effectiveness of these classifiers (both ensemble and simple) on five different data sets of varying session length. Presently the results of web server log analyzers are not very much reliable because the input log files are highly inflated by sessions of automated web traverse software's, known as web robots. Presence of web robots access traffic entries in web server log repositories imposes a great challenge to extract any actionable and usable knowledge about browsing behavior of actual visitors. So web robots sessions need accurate and fast detection from web server log repositories to extract knowledge about genuine visitors and to produce correct results of log analyzers.
Sensor networks face many problems that do not arise in other types of networks. Power constraints, limited hardware, decreased reliability, and a typically higher density and number of nodes than found in conventional networks are just a small portion of the problems that have to be considered when developing protocols for use in sensor networks. Simulation is often used to test new protocols that are being developed, as well as to compare old protocols. However, there is always a danger when using simulation in testing: the results are not necessarily going to be accurate or representative. To help overcome this, it is important to possess knowledge of the simulation tools available, along with their associated strengths and weaknesses. The goal of this paper is to aid developers in the selection of an appropriate simulation tool.
Ontologies provide features like a common vocabulary, reusability, machine-readable content, and also allows for semantic search, facilitate agent interaction and ordering & structuring of knowledge for the Semantic Web (Web 3.0) application. However, the challenge in ontology engineering is automatic learning, i.e., the there is still a lack of fully automatic approach from a text corpus or dataset of various topics to form ontology using machine learning techniques. In this paper, two topic modeling algorithms are explored, namely LSI & SVD and Mr.LDA for learning topic ontology. The objective is to determine the statistical relationship between document and terms to build a topic ontology and ontology graph with minimum human intervention. Experimental analysis on building a topic ontology and semantic retrieving corresponding topic ontology for the user"s query demonstrating the effectiveness of the proposed approach.
The objective of this contribution is to present expositive review content on currently available experimental tools/services/concepts used for most emerging field Wireless Sensor Network that has capability to change many of the Information Communication aspects in the upcoming era. Currently due to high cost of large number of sensor nodes most researches in wireless sensor networks area is performed by using these experimental tools in various universities, institutes, and research centers before implementing real one. Also the statistics gathered from these experimental tools can be realistic and convenient. These experimental tools provide the better option for studying the behavior of WSNs before and after implementing the physical one. In this contribution 63 simulators/simulation frameworks, 14 emulators, 19 data visualization tools, 46 testbeds, 26 debugging tools/services/concepts, 10 code-updation/reprogramming tools and 8 network monitors has been presented that are used worldwide for WSN researches
With the abundance of exceptionally High Dimensional data, feature selection has become an essential element in the Data Mining process. In this paper, we investigate the problem of efficient feature selection for classification on High Dimensional datasets. We present a novel filter based approach for feature selection that sorts out the features based on a score and then we measure the performance of four different Data Mining classification algorithms on the resulting data. In the proposed approach, we partition the sorted feature and search the important feature in forward manner as well as in reversed manner, while starting from first and last feature simultaneously in the sorted list. The proposed approach is highly scalable and effective as it parallelizes over both attribute and tuples simultaneously allowing us to evaluate many of potential features for High Dimensional datasets. The newly proposed framework for feature selection is experimentally shown to be very valuable with real and synthetic High Dimensional datasets which improve the precision of selected features. We have also tested it to measure classification accuracy against various feature selection process.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.