2011
DOI: 10.1007/s11280-011-0124-6
|View full text |Cite
|
Sign up to set email alerts
|

Indexing and querying segmented web pages: the BlockWeb Model

Abstract: We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b in the s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2012
2012
2015
2015

Publication Types

Select...
3
1
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(2 citation statements)
references
References 15 publications
0
2
0
Order By: Relevance
“…This identification is dependent on a palette of features: HTML heuristics [19,30], visual clues [6,26,37], or popular "shallow" text features (e.g., link density or text length). Content features may be fed to a machine learning algorithm, for either clustering [3,4,36], or classification [5,33].…”
Section: Related Workmentioning
confidence: 99%
“…This identification is dependent on a palette of features: HTML heuristics [19,30], visual clues [6,26,37], or popular "shallow" text features (e.g., link density or text length). Content features may be fed to a machine learning algorithm, for either clustering [3,4,36], or classification [5,33].…”
Section: Related Workmentioning
confidence: 99%
“…In this sense, our partition-based indexing is complementary to the query optimization techniques. Partitioning of blocks is natural and can be applied in Web search as well [9]. In particular, to index and query with partitions, Web pages are segmented into several blocks according to their contents, such as titles, articles, images, etc.…”
Section: Related Workmentioning
confidence: 99%