In the present, the web has become the environment to live, learn, entertain, and socialize individually or as a group through digital platforms where users with high aspirations. As a result, investigating the web user behaviour is most active research even in the present and demands re-innovation in potential analytics to provide reliable and quality customized solutions. To perform this, the weblog is the primary source and poses tremendous challenges for the web researchers with complex sequence of processing steps and abundant information of weblog. Further, limited distributed storage models, partial parallel computing techniques, typical identification of appropriate attributes in the weblog analysis demands the high competitive performance models for effective characterization of web users. The importance of preprocessing in the entire process of weblog analysis is so critical while it is popular among researchers, nonetheless, the studies are limited. In addition, existing pre-processing studies focus on elicitation, reduction and transformation of web user usage data individually not comprehensively.Towards this, the present paper proposes Enriched Pre-processing Model (EPPM) that comprehensively concentrating on all the stages of pre-processing of weblog data in the framework of apache spark. The EPPM enables the capability of processing real time streaming data along with batch data as to sustain the validity of web user behaviour extracted from historical data also requires the strategy of processing real time streaming data. In addition to all pre-processing steps, EPPM integrates a machine learning approach to discard the search engine accessed logs from weblog as they are excessive in noticing the web user behaviour. The performance of EPPM is validated by conducting a series of experiments on a server side weblog data in a standard execution environment. The experimental results are also included.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.