2010 Second International Conference on Information Technology and Computer Science 2010
DOI: 10.1109/itcs.2010.76
View full text |Buy / Rent full text
|
Sign up to set email alerts
|

Abstract: Now many applications need to analyze various detail contents of web pages. How to extract web information quickly and effectively becomes very important. Web information is primarily expressed by HTML. HTMLParser is an open project of SourceForge.net and can parse HTML in either a linear or a nested fashion. This paper analyzes the principle of extracting web information based on HTMLParser. In addition, it gives an approach of implementing web information extraction with the classes and methods provided by H… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0
1

Year Published

2013
2013
2013
2013

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 3 publications
(2 reference statements)
0
0
0
1
Order By: Relevance
“…A pesquisa foi concentrada, principalmente, nas áreas de navegação e visualização de formatos Linked Data, além de trabalhos que visam extração de informações em páginas Web.Lin e Hu[14] apresentam o HTMLParser, que é um método para analisar páginas HTML e efetivamente extrair conteúdos de forma linear ou aninhada. O parser possui filtros e tags personalizadas, oferecendo uma interface de utilização simples.…”
unclassified