Proceedings of the 2006 ACM Symposium on Applied Computing 2006
DOI: 10.1145/1141277.1141534
|View full text |Cite
|
Sign up to set email alerts
|

Template detection for large scale search engines

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
19
0

Year Published

2009
2009
2015
2015

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 33 publications
(19 citation statements)
references
References 13 publications
0
19
0
Order By: Relevance
“…Similarly to Bar-Yossef, Liang Chen et al [6] is based on the identification of similar blocks (pagelets). The difference is that the whole procedure is done during the index building of a search engine.…”
Section: Related Workmentioning
confidence: 99%
“…Similarly to Bar-Yossef, Liang Chen et al [6] is based on the identification of similar blocks (pagelets). The difference is that the whole procedure is done during the index building of a search engine.…”
Section: Related Workmentioning
confidence: 99%
“…One approach for content extraction is the Template Detection (TD) algorithms [1,3,7,11,20] which applies collections of documents with the same templates to learn common structures. BarYossef et al presented an approach to automatically detect templates by counting the frequent pagelet item sets, where each pagelet is a self-contained logical regions in a web with a well defined topic or functionality [1].…”
Section: Related Workmentioning
confidence: 99%
“…They segmented web pages into blocks, and blocks are clustered according to their style features. The blocks with similar layout style and content were detected and identified as templates [3]. All these template detection algorithms generally try to identify the main content by removing common parts found in different web pages.…”
Section: Related Workmentioning
confidence: 99%
“…Several procedures have been proposed for template detection [14]. First method uses the concept of page let to divide a Web page.…”
Section: 3template Detectionmentioning
confidence: 99%