Proceedings of the 16th International Conference on World Wide Web 2007
DOI: 10.1145/1242572.1242744
|View full text |Cite
|
Sign up to set email alerts
|

First-order focused crawling

Abstract: This paper reports a new general framework of focused web crawling based on "relational subgroup discovery". Predicates are used explicitly to represent the relevance clues of those unvisited pages in the crawl frontier, and then firstorder classification rules are induced using subgroup discovery technique. The learned relational rules with sufficient support and confidence will guide the crawling process afterwards. We present the many interesting features of our proposed first-order focused crawler, togethe… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2008
2008
2017
2017

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 13 publications
(6 citation statements)
references
References 4 publications
0
6
0
Order By: Relevance
“…These techniques do not use strict mathematical definitions and are tolerant to imprecision, uncertainty, and partial truth to achieve a solution. In the starting paragraph of each category, we will discuss common characteristics of that technique and their subsequent 109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,<...>…”
Section: Focused Web Crawler Using Soft Computing Techniquesmentioning
confidence: 99%
“…These techniques do not use strict mathematical definitions and are tolerant to imprecision, uncertainty, and partial truth to achieve a solution. In the starting paragraph of each category, we will discuss common characteristics of that technique and their subsequent 109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,<...>…”
Section: Focused Web Crawler Using Soft Computing Techniquesmentioning
confidence: 99%
“…Ari Pirkola in [32] described negligence of historical results and inability to handle intermediate linguity as the main problems for any crawler. Xu, Qingyang and Zuo, Wanli [33] presented general framework of focused web crawling based on "relational subgroup discovery". Predicates were used explicitly to represent the relevance clues of those unvisited pages in the crawl frontier, and then first-order classification rules were induced using subgroup discovery technique.…”
Section: Related Workmentioning
confidence: 99%
“…Some efforts by different researchers in the same direction are given below. Focused crawler [11] discover semantic web data, by using some sort of heuristic to rate pages according to their relevance to a given topic. This crawler should stay focused around the given topic, so that irrelevant pages should not pursued by the crawler.…”
Section: Related Workmentioning
confidence: 99%