An empirical study on using hidden markov model for search interface segmentation

Khare, Ritu; An, Yuan

doi:10.1145/1645953.1645959

Cited by 19 publications

(23 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Section: Related Workmentioning

confidence: 99%

“…Khare and An [7] rely on a Hidden Markov Model (HMM) to label major components of a web interface, such as text-labels, text-boxes, and selection lists, and extract information from the interface. The authors in [7] train different HMMs, one for each available template of a web page, using training data that are grouped according to the templates. This approach is similar to ADEx, since ADEx constructs a decision tree classifier for extracting data from ads in a particular domain D using training ads data belonged to D. Even though HMM is effective for data extraction, it is slower than existing supervised algorithms, including the decision tree employed by ADEx.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Web-based closed-domain data extraction on online advertisements

Pera

Qumsiyeh

2013

Information Systems

View full text Add to dashboard Cite

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Web-based closed-domain data extraction on online advertisements

Pera

Qumsiyeh

2013

Information Systems

View full text Add to dashboard Cite

“…We have developed an effective and efficient method for segmenting form elements into semantically related groups [15]. To address the form2db problem, we extend the previous segmentation technique to a tree extraction method.…”

Section: Extracting Form Treesmentioning

confidence: 99%

“…The FormMapper system is built on our previous work [15,16] and extends it in many aspects: First, the FormMapper integrates multiple forms into a single database instead of creating individual database for each form. Second, we have attempted to implement the following requirements in the FormMapper: (i) the system can accept sophisticated forms as input, (ii) the system automatically captures the semantic relationships among form elements, (iii) the system automatically links form elements to the elements in the hidden database, (iv) the system automatically extends the hidden database for unmatched form elements, and (v) the system automatically generates mapping expressions between the form and the hidden database.…”

Section: Introductionmentioning

confidence: 99%

“…The tree extraction component leverages a machine learning technique, Hidden Markov Model (HMM) [25], for automatically extracting a tree structure from a data entry form. In a previous study, we have applied the HMM model to the problem of segmenting search interfaces [15]. To address the form2db problem, we extend the method to automatically extract a complete tree structure from a form.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatically Mapping and Integrating Multiple Data Entry Forms into a Database

An¹,

Khare²,

Song³

et al. 2011

Conceptual Modeling – ER 2011

Self Cite

View full text Add to dashboard Cite

Abstract. Forms are a standard way of gathering data into a database. Many applications need to support multiple users with evolving data gathering requirements. It is desirable to automatically link dynamic forms to the back-end database. We have developed the FormMapper system, a fully automatic solution that accepts user-created data entry forms, and maps and integrates them into an existing database in the same domain. The solution comprises of two components: tree extraction and form integration. The tree extraction component leverages a probabilistic process, Hidden Markov Model (HMM), for automatically extracting a semantic tree structure of a form. In the form integration component, we develop a merging procedure that maps and integrates a tree into an existing database and extends the database with desired properties. We conducted experiments evaluating the performance of the system on several large databases designed from a number of complex forms. Our experimental results show that the FormMapper system is promising: It generated databases that are highly similar (87% overlapped) to those generated by the human experts, given the same set of forms.

show abstract