2011
DOI: 10.1109/tkde.2010.27
|View full text |Cite
|
Sign up to set email alerts
|

A Hidden Topic-Based Framework toward Building Applications with Short Web Documents

Abstract: This paper introduces a hidden topic-based framework for processing short and sparse documents (e.g., search result snippets, product descriptions, book/movie summaries, and advertising messages) on the Web. The framework focuses on solving two main challenges posed by these kinds of documents: 1) data sparseness and 2) synonyms/homonyms. The former leads to the lack of shared words and contexts among documents while the latter are big linguistic obstacles in natural language processing (NLP) and information r… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
63
0

Year Published

2011
2011
2021
2021

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 101 publications
(65 citation statements)
references
References 34 publications
0
63
0
Order By: Relevance
“…[21] improved Sahami's work by involving a learning process to make the measure more appropriate for the target corpus. Phan et al [12,11] proposed to convert additional knowledge base to topics to improve the representation of the short texts. The knowledge base is crawled with selected seeds from several topics to avoid noise.…”
Section: Mining Short Textmentioning
confidence: 99%
See 3 more Smart Citations
“…[21] improved Sahami's work by involving a learning process to make the measure more appropriate for the target corpus. Phan et al [12,11] proposed to convert additional knowledge base to topics to improve the representation of the short texts. The knowledge base is crawled with selected seeds from several topics to avoid noise.…”
Section: Mining Short Textmentioning
confidence: 99%
“…Directly learning topic models on short text is much harder than on traditional long text. For this reason, [12,11] proposed to train topic models on a collection long text in the same domain and then make inference on short text to help the learning task on short texts. However, in highly dynamic domains like Twitter where novel topics and trends constantly emerge, it is not always possible to find strongly related long texts via a search engine or a static knowledge base such as Wikipedia.…”
Section: Dual Latent Dirichlet Alloca-tion (Dlda)mentioning
confidence: 99%
See 2 more Smart Citations
“…In this frame work, they solved two problems like data sparseness, synonyms problem using LDA method through MaxEnt classifier [8] .…”
Section: Related Workmentioning
confidence: 99%