A web-based kernel function for measuring the similarity of short text snippets

Sahami, Mehran; Heilman, Timothy D.

doi:10.1145/1135777.1135834

Cited by 589 publications

(397 citation statements)

References 14 publications

Supporting

Mentioning

395

Contrasting

Unclassified

Order By: Relevance

“…Many works have focused on this aspect by proposing enriched text representations and proximity metrics that attempt to get more realistic semantic comparisons. These approaches have included the use of additional information obtained from the Web [39,52], external resources like Wikipedia [2] and Wordnet [21], combinations of internal and external semantics [23] and learning term-weighting functions for similarity measures [51]. Although these proposals are still far from getting the semantic level previously explained in the cognitive science works, they present interesting research lines for future work.…”

Section: Related Workmentioning

confidence: 99%

An efficient Particle Swarm Optimization approach to cluster short texts

Cagnina

Errecalde

Ingaramo

et al. 2014

Information Sciences

View full text Add to dashboard Cite

Short texts such as evaluations of commercial products, news, FAQ's and scientific abstracts are important resources on the Web due to the constant requirements of people to use this on line information in real life. In this context, the clustering of short texts is a significant analysis task and a discrete Particle Swarm Optimization (PSO) algorithm named CLUDIPSO has recently shown a promising performance in this type of problems. CLUDIPSO obtained high quality results with small corpora although, with larger corpora, a significant deterioration of performance was observed. This article presents CLUDIPSO ⋆ , an improved version of CLUDIPSO, which includes a different representation of particles, a more efficient evaluation of the function to be optimized and some modifications in the mutation operator. Experimental results with corpora containing scientific abstracts, news and short legal documents obtained from the Web, show that CLUDIPSO ⋆ is an effective clustering method for short-text corpora of small and medium size.

show abstract

Section: Related Workmentioning

confidence: 99%

An efficient Particle Swarm Optimization approach to cluster short texts

Cagnina

Errecalde

Ingaramo

et al. 2014

Information Sciences

View full text Add to dashboard Cite

show abstract

“…One general strategy for solving this problem is to expand text representation by exploiting related text documents, which is related to smoothing of a document language model in information retrieval [105]. A specific technique, which leverages a search engine to expand text representation, was proposed in [79]. A comparison of several simple measures for computing similarity of short text segments can be found in [66].…”

Section: Distance-based Clustering Algorithmsmentioning

confidence: 99%

A Survey of Text Clustering Algorithms

2012

View full text Add to dashboard Cite

Clustering is a widely studied data mining problem in the text domains. The problem finds numerous applications in customer segmentation, classification, collaborative filtering, visualization, document organization, and indexing. In this chapter, we will provide a detailed survey of the problem of text clustering. We will study the key challenges of the clustering problem, as it applies to the text domain. We will discuss the key methods used for text clustering, and their relative advantages. We will also discuss a number of recent advances in the area in the context of social network and linked data.

show abstract

“…Sahami and Heilman employed a similarity kernel function to estimate the short text similarity by making use of the search engine to extend features of short text [15]. Wang et al [16] and Yuan [17] proposed a mining algorithm based on association rule to extract association relationship between features included in training and testing sets, which further obtains the extended features corresponding to the words.…”

Section: Related Workmentioning

confidence: 99%

“…The indexed terms in are unique terms of prefixes in , where each indexed term refers to a list composed of all videos whose prefixes contain the corresponding indexed term. In the following stage, may be possible similar to ∈ by the inverted index (line [8][9][10][11][12][13][14][15][16][17][18][19]. For each item ∈ , we firstly figure out the prefix Pre( ).…”

Section: Prefix Filtering Based On Derived Jaccardmentioning

confidence: 99%

A Novel Mobile Video Community Discovery Scheme Using Ontology-Based Semantical Interest Capture

Zhang

Xiong

2016

Mobile Information Systems

View full text Add to dashboard Cite

Leveraging network virtualization technologies, the community-based video systems rely on the measurement of common interests to define and steady relationship between community members, which promotes video sharing performance and improves scalability community structure. In this paper, we propose a novel mobile Video Community discovery scheme using ontologybased semantical interest capture (VCOSI). An ontology-based semantical extension approach is proposed, which describes video content and measures video similarity according to video key word selection methods. In order to reduce the calculation load of video similarity, VCOSI designs a prefix-filtering-based estimation algorithm to decrease energy consumption of mobile nodes. VCOSI further proposes a member relationship estimate method to construct scalable and resilient node communities, which promotes video sharing capacity of video systems with the flexible and economic community maintenance. Extensive tests show how VCOSI obtains better performance results in comparison with other state-of-the-art solutions.

show abstract

A web-based kernel function for measuring the similarity of short text snippets

Cited by 589 publications

References 14 publications

An efficient Particle Swarm Optimization approach to cluster short texts

An efficient Particle Swarm Optimization approach to cluster short texts

A Survey of Text Clustering Algorithms

A Novel Mobile Video Community Discovery Scheme Using Ontology-Based Semantical Interest Capture

Contact Info

Product

Resources

About