2022
DOI: 10.1609/icwsm.v16i1.19380
|View full text |Cite
|
Sign up to set email alerts
|

Homepage2Vec: Language-Agnostic Website Embedding and Classification

Abstract: Currently, publicly available models for website classification do not offer an embedding method and have limited support for languages beyond English. We release a dataset of more than two million category-labeled websites in 92 languages collected from Curlie, the largest multilingual human-edited Web directory. The dataset contains 14 website categories aligned across languages. Alongside it, we introduce Homepage2Vec, a machine-learned pre-trained model for classifying and embedding websites based on their… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 25 publications
(28 reference statements)
0
0
0
Order By: Relevance