2006
DOI: 10.1007/11610113_22
|View full text |Cite
|
Sign up to set email alerts
|

Classifying Web Data in Directory Structures

Abstract: Abstract. Web Directories have emerged as an alternative to the Search Engines for locating information on the Web. Typically, Web Directories rely on humans putting in significant time and effort into finding important pages on the Web and categorizing them in the Directory. In this paper, we experimentally study the automatic population of a Web Directory via the use of a subject hierarchy. For our study, we have constructed a subject hierarchy for the top level topics offered in Dmoz, by leveraging ontologi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2007
2007
2009
2009

Publication Types

Select...
2
2
2

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 20 publications
(12 reference statements)
0
4
0
1
Order By: Relevance
“…By using a topical hierarchy that is manually constructed our hope is that we can capture in a better way the topical interests of the users. In addition, by enriching such a hierarchy with WordNet concepts and by using them when determining the topics of the pages we can achieve good classification accuracy (Stamou et al 2006). It is likely that under a different classification scheme or topical ontology, our model would perform differently but we believe that it could still be directly applied without the need for any modifications.…”
Section: Discussionmentioning
confidence: 98%
See 2 more Smart Citations
“…By using a topical hierarchy that is manually constructed our hope is that we can capture in a better way the topical interests of the users. In addition, by enriching such a hierarchy with WordNet concepts and by using them when determining the topics of the pages we can achieve good classification accuracy (Stamou et al 2006). It is likely that under a different classification scheme or topical ontology, our model would perform differently but we believe that it could still be directly applied without the need for any modifications.…”
Section: Discussionmentioning
confidence: 98%
“…This is also attested in a previous work we have carried out (Stamou et al 2006) where we managed to automatically classify about 300,000 Web pages in the ontology's topics in nearly 6 h (including data cleaning and pre-processing time) without any need for prior training or human involvement.…”
Section: The Topical Ontologymentioning
confidence: 89%
See 1 more Smart Citation
“…the link structure) is becoming important in such studies [Glover et al 2002;Amitay et al 2003]. An interesting mixed classification methodology is in Stamou et al [2006] where the training data is provided implicitly by using hand made directories. These techniques can handle single sites and a small group of pages (in supervised mode) detecting high level functionalities among a set of categories defined at training time.…”
Section: State Of the Art In The Classification Of Web Pages And Web mentioning
confidence: 99%
“…Άλλεσ τεχνικζσ βαςίηονται ςτθν γνϊςθ οντολογιϊν εννοιϊν, όπωσ το ςθμαςιολογικό δίκτυο WordNet. τθν εργαςία(Stamou et al, 2006)[141] χρθςιμοποιοφνται λεξικζσ αλυςίδεσ(Barzilay, 1997) [13+ όρων που ςχετίηονται ςθμαςιολογικά και οι οποίεσ μποροφν να χρθςιμοποιθκοφν ωσ αναπαραςτάςεισ των προσ κατθγοριοποίθςθ ιςτοςελίδων.Για να κατθγοριοποιθκεί μία ιςτοςελίδα ςε κάποια κατθγορία ςυγκρίνονται οι όροι τθσ αντίςτοιχθσ λεξικισ αλυςίδασ με τουσ κεματικοφσ όρουσ κάκε υποψιφιασ κατθγορίασ. ε διαφορετικό πνεφμα οι Holden N., & Freitas A.,[197] κατθγοριοποιοφν ιςτοςελίδεσ εξάγοντασ χαρακτθριςτικά από τισ μετά-πλθροφορίεσ keywords18 και description19 μίασ Λςτοςελίδασ.…”
unclassified