A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification

Vellingiri, J.; Pandian, S. Chenthur

doi:10.3844/jcssp.2011.683.689

Cited by 13 publications

(4 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A novel architecture of the incremental parallel crawler based on focused crawling is proposed to overcome the drawbacks said by (Vellingiri and Pandian, 2011;Wu and Lai, 2010;Tyagi and Gupta, 2010) and relevant web pages are crawled concurrently which are relevant to multiple pre-defined topics. In our proposed architecture, we added a second level master, in the same topic it masters the crawlers and thereby overlapping issues are avoided, which largely reduces the space and the cost of communication.…”

Section: Scalable Focused Crawling Using Incremental Parallel Web Cramentioning

confidence: 99%

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

Jaganathan¹,

Karthikeyan²

2015

Journal of Computer Science

View full text Add to dashboard Cite

With the growing industrial impact over the recent years in computer science, data mining has established itself as one of the most important disciplines. In the fast growing Web and in an appropriate amount of time, locating the resources that are precise and relevant is a huge challenge for the all-purpose single process crawlers, which makes the enhanced and the convincing algorithm in demand. Gradually Large scale search engines frequently update their index and in a timely behavior which are not capable to present such information. In this study a scalable focused crawling is proposed with an incremental parallel Web crawler, the Web pages can be crawled concurrently that are relevant to multiple pre-defined topics. Furthermore, to solve the issue of URL distribution, a compound decision model based on multi-objective decision making method is introduced, which will consider multiple factors synthetically such as load balance and relevance, the update frequency issue can be solved by the local repository decision. The result shows that our proposed system will efficiently produce high quality, relevance and freshness with significantly low memory requirement.

show abstract

Section: Scalable Focused Crawling Using Incremental Parallel Web Cramentioning

confidence: 99%

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

Jaganathan¹,

Karthikeyan²

2015

Journal of Computer Science

View full text Add to dashboard Cite

show abstract

“…Navin Tyagi et al [21] Vellingiri J. et al [26] focuses on providing techniques for better data cleaning and transaction identification from the web log. They used data preprocessing methods including data cleaning by remove unnecessary data, robot cleaning; user identification using reference length, where reference length is the time taken by the user to view a particular page; session identification, path completion and transaction identification using reference length.…”

Section: Literature Reviewmentioning

confidence: 99%

A Review Analysis of Preprocessing Techniques in Web usage Mining

Upadhyay¹,

Patel²

2015

IJERT

View full text Add to dashboard Cite

The Internet web has become popular tool to assist human for their information needs from web server. Due to increasing number of users for web access day by day, there is a need to analyze behavior of such user, in order to monitor and improve performance and throughput of website. Web usage mining is one of the data mining applications which deal with web log files and extract useful information from web. There are different phases are for web usage mining: Data preprocessing, discover pattern and pattern analysis. Among them data preprocessing is the most crucial phase of web usage mining because without good quality of data it is difficult to identify pattern of users behavior. This paper provides reviews of different data preprocessing methods like data collection, data cleaning, User identification, session identification and path completion which will be useful for the community to select one or combination of available techniques in order to carry out efficient preprocessing in order to obtain reliable data mining outcome.

show abstract

“…Field extraction algorithm carries out the process of extracting fields from the single line of the log file. Data cleaning approach removes unrelated or unnecessary items from the web log data (Vellingiri et al, 2011). Shin and Jo (2008) developed a novel automatic web information extractor called dasiacatch crawlerpsila which uses style sheet to obtain necessary data on an objective site.…”

Section: Pre-processingmentioning

confidence: 99%

A Novel Approach for User Navigation Pattern Discovery and Analysis for Web Usage Mining

Vellingiri¹,

Kaliraj²,

Satheeshkumar³

et al. 2015

Journal of Computer Science

View full text Add to dashboard Cite

Websites on the internet are useful source of information in our day-to-day activity. Web Usage Mining (WUM) is one of the major applications of data mining, artificial intelligence and so on to the web data to predict the user's visiting behaviours and obtains their interests by analyzing the patterns.WUM has turned out to be one of the considerable areas of research in the field of computer and information science. Weblog is one of the major sources which contain all the information regarding the users visited links, browsing patterns, time spent on a page or link and this information can be used in several applications like adaptive web sites, personalized services, customer profiling, prefetching, creating attractive web sites etc. WUM consists of preprocessing, pattern discovery and pattern analysis. Log data is typically noisy and unclear, so preprocessing is an essential process for effective mining process. In the preprocessing phase, the data cleaning process includes removal of records of graphics, videos, format information, records with the failed HTTP status code and robots cleaning. In the second phase, the user behaviour is organized into a set of clusters using Weighted Fuzzy-Possibilistic C-Means (WFPCM), which consists of "similar" data items based on the user behaviour and navigation patterns for the use of pattern discovery. In the third phase, classification of the user behaviour is carried out for the purpose of analyzing the user behaviour using Adaptive Neuro-Fuzzy Inference System with Subtractive Algorithm (ANFIS-SA). The performance of the proposed work is evaluated based on accuracy, execution time and convergence behaviour using anonymous microsoft web dataset.

show abstract

A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification

Cited by 13 publications

References 26 publications

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

Highly Efficient Architecture for Scalable Focused Crawling Using Incremental Parallel Web Crawler

A Review Analysis of Preprocessing Techniques in Web usage Mining

A Novel Approach for User Navigation Pattern Discovery and Analysis for Web Usage Mining

Contact Info

Product

Resources

About