Proceedings of the 18th ACM Conference on Information and Knowledge Management 2009
DOI: 10.1145/1645953.1645959
|View full text |Cite
|
Sign up to set email alerts
|

An empirical study on using hidden markov model for search interface segmentation

Abstract: This paper describes a hidden Markov model (HMM) based approach to perform search interface segmentation. Automatic processing of an interface is a must to access the invisible contents of deep Web. This entails automatic segmentation, i.e., the task of grouping related components of an interface together. While it is easy for a human to discern the logical relationships among interface components, machine processing of an interface is difficult. In this paper, we propose an approach to segmentation that lever… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
20
0

Year Published

2011
2011
2018
2018

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 19 publications
(23 citation statements)
references
References 19 publications
0
20
0
Order By: Relevance
“…Khare and An [7] rely on a Hidden Markov Model (HMM) to label major components of a web interface, such as text-labels, text-boxes, and selection lists, and extract information from the interface. The authors in [7] train different HMMs, one for each available template of a web page, using training data that are grouped according to the templates.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Khare and An [7] rely on a Hidden Markov Model (HMM) to label major components of a web interface, such as text-labels, text-boxes, and selection lists, and extract information from the interface. The authors in [7] train different HMMs, one for each available template of a web page, using training data that are grouped according to the templates.…”
Section: Related Workmentioning
confidence: 99%
“…Khare and An [7] rely on a Hidden Markov Model (HMM) to label major components of a web interface, such as text-labels, text-boxes, and selection lists, and extract information from the interface. The authors in [7] train different HMMs, one for each available template of a web page, using training data that are grouped according to the templates. This approach is similar to ADEx, since ADEx constructs a decision tree classifier for extracting data from ads in a particular domain D using training ads data belonged to D. Even though HMM is effective for data extraction, it is slower than existing supervised algorithms, including the decision tree employed by ADEx.…”
Section: Related Workmentioning
confidence: 99%
“…We have developed an effective and efficient method for segmenting form elements into semantically related groups [15]. To address the form2db problem, we extend the previous segmentation technique to a tree extraction method.…”
Section: Extracting Form Treesmentioning
confidence: 99%
“…The FormMapper system is built on our previous work [15,16] and extends it in many aspects: First, the FormMapper integrates multiple forms into a single database instead of creating individual database for each form. Second, we have attempted to implement the following requirements in the FormMapper: (i) the system can accept sophisticated forms as input, (ii) the system automatically captures the semantic relationships among form elements, (iii) the system automatically links form elements to the elements in the hidden database, (iv) the system automatically extends the hidden database for unmatched form elements, and (v) the system automatically generates mapping expressions between the form and the hidden database.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation