2015 IEEE International Conference on Data Mining Workshop (ICDMW) 2015
DOI: 10.1109/icdmw.2015.86
|View full text |Cite
|
Sign up to set email alerts
|

Learning Semantic Similarity for Very Short Texts

Abstract: Levering data on social media, such as Twitter and Facebook, requires information retrieval algorithms to become able to relate very short text fragments to each other. Traditional text similarity methods such as tf-idf cosine-similarity, based on word overlap, mostly fail to produce good results in this case, since word overlap is little or non-existent. Recently, distributed word representations, or word embeddings, have been shown to successfully allow words to match on the semantic level. In order to pair … Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
40
0
1

Year Published

2017
2017
2024
2024

Publication Types

Select...
6
3
1

Relationship

0
10

Authors

Journals

citations
Cited by 60 publications
(42 citation statements)
references
References 5 publications
0
40
0
1
Order By: Relevance
“…So far, existing sentence embedding methods often require (pretrained) word embeddings [10,12], large amounts of data [8], or both [13,11]. While word embeddings are successful at enhancing sentence embeddings, they are not very plausible as a model of human language learning.…”
Section: Introductionmentioning
confidence: 99%
“…So far, existing sentence embedding methods often require (pretrained) word embeddings [10,12], large amounts of data [8], or both [13,11]. While word embeddings are successful at enhancing sentence embeddings, they are not very plausible as a model of human language learning.…”
Section: Introductionmentioning
confidence: 99%
“…The work by Wang et al, [65] proposed a social media analytics engine that employs a fuzzy similarity-based classification method to automatically classify text messages into sentiment categories (positive, negative, neutral and mixed), with the ability to identify their prevailing emotion categories (e.g., satisfaction, happiness, excitement, anger, sadness, and anxiety). Others attempted to identify the semantic similarity of very short texts in Twitter and Facebook [66]. Also, a lexical similarity-based approach for extracting subjectivity in documents extracted from social media was proposed in [67].…”
Section: Sentiment Analysis Toolsmentioning
confidence: 99%
“…The word-embedding method has successfully identified the semantic distances between two sentences better than the traditional approach for text similarity (e.g., the distance of the tf-idf vector) [18]. In this research, we used two wordembedding methods to calculate the semantic distance between Eng2Ind Translation and Ind Caption:…”
Section: Semantic Embeddingsmentioning
confidence: 99%