2009
DOI: 10.1016/j.camwa.2008.09.021
|View full text |Cite
|
Sign up to set email alerts
|

A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Abstract: a b s t r a c tFocused crawling is increasingly seen as a solution to address the scalability limitations of existing general-purpose search engines, by traversing the Web to only gather pages that are relevant to a specific topic. How to predict the relevance of the unvisited pages pointed to by candidate URLs in the crawling frontier to a given topic is a key issue in the design of focused crawlers. In this paper, we propose a novel approach based on multiple relevance prediction strategies to address this p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 10 publications
(4 citation statements)
references
References 15 publications
0
4
0
Order By: Relevance
“…Most of the focused web crawlers, [2][3][4] have used only full page text to estimate the similarity score of the web page, [5][6][7][8] used both full page text and the anchor text for estimating the relevance score. The authors [9][10][11] used cosine similarity metric to estimate the similarity score of the unvisited web pages. The cosine similarity value can be assessed by finding the Term Frequency and Inverse Document Frequency (TF-IDF).…”
Section: Related Workmentioning
confidence: 99%
“…Most of the focused web crawlers, [2][3][4] have used only full page text to estimate the similarity score of the web page, [5][6][7][8] used both full page text and the anchor text for estimating the relevance score. The authors [9][10][11] used cosine similarity metric to estimate the similarity score of the unvisited web pages. The cosine similarity value can be assessed by finding the Term Frequency and Inverse Document Frequency (TF-IDF).…”
Section: Related Workmentioning
confidence: 99%
“…Only full page text and anchor text cannot capture the similarity of the web page accurately. To overcome this issue [9] proposed a focused crawler for cross language crawling, which adopts an algorithm, known as Focused crawling for Multiple Relevance Prediction Strategies (FCMRPS). The FCMRPS is an integration of the average similarity score of four target variables (full page text, anchor text, URL address and link structure) with the topic and shark search algorithm.…”
Section: Vsm Crawler or Classic Focused Crawlermentioning
confidence: 99%
“…Focused crawling based on strategies of preceding multiple relevance [12] uses several strategies of predicting relevance to compute the relation of unvisited pages with user's topic. The ideas of focused crawler algorithm presented consist of combination of four strategies which are applied with the hierarchical taxonomy of topics to predict the relevance more accurately.…”
Section: Introductionmentioning
confidence: 99%