A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Chen, Zhumin; Ma, Jun; Lei, Jingsheng; Yuan, Bo; Li, Lian; Ling, Song

doi:10.1016/j.camwa.2008.09.021

Cited by 10 publications

(4 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most of the focused web crawlers, [2][3][4] have used only full page text to estimate the similarity score of the web page, [5][6][7][8] used both full page text and the anchor text for estimating the relevance score. The authors [9][10][11] used cosine similarity metric to estimate the similarity score of the unvisited web pages. The cosine similarity value can be assessed by finding the Term Frequency and Inverse Document Frequency (TF-IDF).…”

Section: Related Workmentioning

confidence: 99%

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

Prabha

Mahesh

Raja

2021

Cybernetics and Information Technologies

View full text Add to dashboard Cite

Topic precise crawler is a special purpose web crawler, which downloads appropriate web pages analogous to a particular topic by measuring cosine similarity or semantic similarity score. The cosine based similarity measure displays inaccurate relevance score, if topic term does not directly occur in the web page. The semantic-based similarity measure provides the precise relevance score, even if the synonyms of the given topic occur in the web page. The unavailability of the topic in the ontology produces inaccurate relevance score by the semantic focused crawlers. This paper overcomes these glitches with a hybrid string-matching algorithm by combining the semantic similarity-based measure with the probabilistic similarity-based measure. The experimental results revealed that this algorithm increased the efficiency of the focused web crawlers and achieved better Harvest Rate (HR), Precision (P) and Irrelevance Ratio (IR) than the existing web focused crawlers achieve.

show abstract

Section: Related Workmentioning

confidence: 99%

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

Prabha

Mahesh

Raja

2021

Cybernetics and Information Technologies

View full text Add to dashboard Cite

show abstract

“…Only full page text and anchor text cannot capture the similarity of the web page accurately. To overcome this issue [9] proposed a focused crawler for cross language crawling, which adopts an algorithm, known as Focused crawling for Multiple Relevance Prediction Strategies (FCMRPS). The FCMRPS is an integration of the average similarity score of four target variables (full page text, anchor text, URL address and link structure) with the topic and shark search algorithm.…”

Section: Vsm Crawler or Classic Focused Crawlermentioning

confidence: 99%

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

Dhanith

Surendiran

Raja

2021

Braz. arch. biol. technol.

View full text Add to dashboard Cite

Analogous to the spectacular growth of information-superhighway, The Internet, demands for coherent and economical crawling methods are translucent to shoot up. Consequently, many innovative techniques have been put forth for efficient crawling. Among them the significant one is focused crawlers. The focused crawlers are capable in searching web pages that are suitable for the topics defined in advance. Focused crawlers attract several search engines on the grounds of efficient filtering, reduced memory and time consumption. This paper furnishes a relevance computation based survey on web crawling. A bunch of fifty two focused crawlers from the existing literature survey is categorized to four different classes -classic focused crawler, semantic focused crawler, learning focused crawler and ontology learning focused crawler. The prerequisite and the mastery of each metric with respect to harvest rate, target recall, precision and F1score are discussed. Future outlooks, shortcomings and strategies are also suggested.

show abstract

“…Focused crawling based on strategies of preceding multiple relevance [12] uses several strategies of predicting relevance to compute the relation of unvisited pages with user's topic. The ideas of focused crawler algorithm presented consist of combination of four strategies which are applied with the hierarchical taxonomy of topics to predict the relevance more accurately.…”

Section: Introductionmentioning

confidence: 99%

A new fuzzy-based method to weigh the related concepts in semantic focused web crawlers

Jalilian

Khotanlou

2011

2011 3rd International Conference on Computer Research and Development

View full text Add to dashboard Cite

Semantic focused crawler computes the priority of pages crawling by web page semantic similarity with topic which is defined by ontology. Ontology is a new approach referred to as the main pivot of change from the present web to a new web called semantic web. The main problem about focused crawlers is to find a computation function of appropriate relevance based on which the crawler can estimate the similarity between topic and web document appropriately. In this paper a new method to weigh the concepts related to topic is suggested to be used as a main component in the architecture of semantic crawler to compute relevance of web page with the topic. The concepts related to the topic are retrieved by ontology graph and their weights are computed by a proposed fuzzy inference system. The results show that the proposed approach presents better precision rate compared with breadth-first and best-first search.

show abstract

A cross-language focused crawling algorithm based on multiple relevance prediction strategies

Cited by 10 publications

References 15 publications

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

An Enhanced Semantic Focused Web Crawler Based on Hybrid String Matching Algorithm

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

A new fuzzy-based method to weigh the related concepts in semantic focused web crawlers

Contact Info

Product

Resources

About