2010
DOI: 10.1145/1870096.1870098
|View full text |Cite
|
Sign up to set email alerts
|

A Combination Approach to Web User Profiling

Abstract: In this article, we study the problem of Web user profiling, which is aimed at finding, extracting, and fusing the “semantic”-based user profile from the Web. Previously, Web user profiling was often undertaken by creating a list of keywords for the user, which is (sometimes even highly) insufficient for main applications. This article formalizes the profiling problem as several subtasks: profile extraction, profile integration, and user interest discovery. We propose a combination approach to deal with the pr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
52
0
1

Year Published

2010
2010
2020
2020

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 122 publications
(53 citation statements)
references
References 34 publications
0
52
0
1
Order By: Relevance
“…al. [9] used SVM for identifying the homepage of a person, and to define features where whether the title of the page contains the person name and whether the URL address (partly) contains the person name. Gershman et.…”
Section: Svmmentioning
confidence: 99%
“…al. [9] used SVM for identifying the homepage of a person, and to define features where whether the title of the page contains the person name and whether the URL address (partly) contains the person name. Gershman et.…”
Section: Svmmentioning
confidence: 99%
“…Arnetminer relies on rich researcher description created by Web user profiling, i.e. finding, extracting and fusing the "semantic-based" user profile from various Internet sources (Tang et al 2010).…”
Section: Related Workmentioning
confidence: 99%
“…To deal with the disambiguation problem, as a starting point we used the algorithm proposed by Tang et al (2010), which consists in grouping publications with matching authors' first and middle names, and then clustering each group, taking into account co-authorship, citations, extended co-authorship, and user restrictions. When comparing this with the original algorithm, for the distance measure in the clustering algorithm we added the similarity of the titles, as it is known that scientists often use the same words in their publication titles.…”
Section: Acquisition Of Publicationsmentioning
confidence: 99%
“…The other dataset is the DBLP-Citation-network V5 (DBLP) available at Arnetminer.org [19], [20], [21], [22], which consists of two major computer science bibliographic datasets, DBLP and ACM, covering publications from 1936 to 2011. The DBLP dataset contains some of the important papers in computer science that describe widely used techniques and algorithms.…”
Section: A Datamentioning
confidence: 99%