2011
DOI: 10.3844/jcssp.2011.683.689
|View full text |Cite
|
Sign up to set email alerts
|

A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification

Abstract: Problem statement:In the internet era web sites on the internet are useful source of information for almost every activity. So there is a rapid development of World Wide Web in its volume of traffic and the size and complexity of web sites. Web mining is the application of data mining, artificial intelligence, chart technology and so on to the web data and traces user's visiting behaviors and extracts their interests using patterns. Because of its direct application in e-commerce, Web analytics, e-learning, in… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2013
2013
2018
2018

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(4 citation statements)
references
References 26 publications
0
4
0
Order By: Relevance
“…A novel architecture of the incremental parallel crawler based on focused crawling is proposed to overcome the drawbacks said by (Vellingiri and Pandian, 2011;Wu and Lai, 2010;Tyagi and Gupta, 2010) and relevant web pages are crawled concurrently which are relevant to multiple pre-defined topics. In our proposed architecture, we added a second level master, in the same topic it masters the crawlers and thereby overlapping issues are avoided, which largely reduces the space and the cost of communication.…”
Section: Scalable Focused Crawling Using Incremental Parallel Web Cramentioning
confidence: 99%
“…A novel architecture of the incremental parallel crawler based on focused crawling is proposed to overcome the drawbacks said by (Vellingiri and Pandian, 2011;Wu and Lai, 2010;Tyagi and Gupta, 2010) and relevant web pages are crawled concurrently which are relevant to multiple pre-defined topics. In our proposed architecture, we added a second level master, in the same topic it masters the crawlers and thereby overlapping issues are avoided, which largely reduces the space and the cost of communication.…”
Section: Scalable Focused Crawling Using Incremental Parallel Web Cramentioning
confidence: 99%
“…Navin Tyagi et al [21] Vellingiri J. et al [26] focuses on providing techniques for better data cleaning and transaction identification from the web log. They used data preprocessing methods including data cleaning by remove unnecessary data, robot cleaning; user identification using reference length, where reference length is the time taken by the user to view a particular page; session identification, path completion and transaction identification using reference length.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Field extraction algorithm carries out the process of extracting fields from the single line of the log file. Data cleaning approach removes unrelated or unnecessary items from the web log data (Vellingiri et al, 2011). Shin and Jo (2008) developed a novel automatic web information extractor called dasiacatch crawlerpsila which uses style sheet to obtain necessary data on an objective site.…”
Section: Pre-processingmentioning
confidence: 99%