Websites on the internet are useful source of information in our day-to-day activity. Web Usage Mining (WUM) is one of the major applications of data mining, artificial intelligence and so on to the web data to predict the user's visiting behaviours and obtains their interests by analyzing the patterns.WUM has turned out to be one of the considerable areas of research in the field of computer and information science. Weblog is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a page or link and this information can be used in several applications like adaptive web sites, personalized services, customer profiling, prefetching, creating attractive web sites etc. WUM consists of preprocessing, pattern discovery and pattern analysis. Log data is typically noisy and unclear, so preprocessing is an essential process for effective mining process. In the preprocessing phase, the data cleaning process includes removal of records of graphics, videos, format information, records with the failed HTTP status code and robots cleaning. In the second phase, the user behaviour is organized into a set of clusters using Weighted Fuzzy-Possibilistic C-Means (WFPCM), which consists of "similar" data items based on the user behaviour and navigation patterns for the use of pattern discovery. In the third phase, classification of the user behaviour is carried out for the purpose of analyzing the user behaviour using Adaptive Neuro-Fuzzy Inference System with Subtractive Algorithm (ANFIS-SA). The performance of the proposed work is evaluated based on accuracy, execution time and convergence behaviour using anonymous microsoft web dataset.
Problem statement:In the internet era web sites on the internet are useful source of information for almost every activity. So there is a rapid development of World Wide Web in its volume of traffic and the size and complexity of web sites. Web mining is the application of data mining, artificial intelligence, chart technology and so on to the web data and traces user's visiting behaviors and extracts their interests using patterns. Because of its direct application in e-commerce, Web analytics, e-learning, information retrieval, web mining has become one of the important areas in computer and information science. There are several techniques like web usage mining exists. But all processes its own disadvantages. This study focuses on providing techniques for better data cleaning and transaction identification from the web log. Approach: Log data is usually noisy and ambiguous and preprocessing is an important process for efficient mining process. In the preprocessing, the data cleaning process includes removal of records of graphics, videos and the format information, the records with the failed HTTP status code and robots cleaning. Sessions are reconstructed and paths are completed by appending missing pages in preprocessing. And also the transactions which depict the behavior of users are constructed accurately in preprocessing by calculating the Reference Lengths of user access by considering byte rate. Results: When the number of records is considered, for example, for 1000 record, only 350 records are resulted using data cleaning. When the execution time is considered, the initial log take s119 seconds for execution, whereas, only 52 seconds are required by proposed technique. Conclusion: The experimental results show the performance of the proposed algorithm and comparatively it gives the good results for web usage mining compared to existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.