Learning Semantic Similarity for Very Short Texts

Boom, Cedric De; Canneyt, Steven Van; Bohez, Steven; Demeester, Thomas; Dhoedt, Bart

doi:10.1109/icdmw.2015.86

Cited by 60 publications

(42 citation statements)

References 5 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…So far, existing sentence embedding methods often require (pretrained) word embeddings [10,12], large amounts of data [8], or both [13,11]. While word embeddings are successful at enhancing sentence embeddings, they are not very plausible as a model of human language learning.…”

Section: Introductionmentioning

confidence: 99%

Learning semantic sentence representations from visually grounded language without lexical knowledge

Merkx

Frank

2019

Nat. Lang. Eng.

View full text Add to dashboard Cite

Current approaches to learning semantic representations of sentences often use prior word-level knowledge. The current study aims to leverage visual information in order to capture sentence level semantics without the need for word embeddings. We use a multimodal sentence encoder trained on a corpus of images with matching text captions to produce visually grounded sentence embeddings. Deep Neural Networks are trained to map the two modalities to a common embedding space such that for an image the corresponding caption can be retrieved and vice versa. We show that our model achieves results comparable to the current state-of-the-art on two popular image-caption retrieval benchmark data sets: MSCOCO and Flickr8k. We evaluate the semantic content of the resulting sentence embeddings using the data from the Semantic Textual Similarity benchmark task and show that the multimodal embeddings correlate well with human semantic similarity judgements. The system achieves state-of-theart results on several of these benchmarks, which shows that a system trained solely on multimodal data, without assuming any word representations, is able to capture sentence level semantics. Importantly, this result shows that we do not need prior knowledge of lexical level semantics in order to model sentence level semantics. These findings demonstrate the importance of visual information in semantics.

show abstract

Section: Introductionmentioning

confidence: 99%

Learning semantic sentence representations from visually grounded language without lexical knowledge

Merkx

Frank

2019

Nat. Lang. Eng.

View full text Add to dashboard Cite

show abstract

“…The work by Wang et al, [65] proposed a social media analytics engine that employs a fuzzy similarity-based classification method to automatically classify text messages into sentiment categories (positive, negative, neutral and mixed), with the ability to identify their prevailing emotion categories (e.g., satisfaction, happiness, excitement, anger, sadness, and anxiety). Others attempted to identify the semantic similarity of very short texts in Twitter and Facebook [66]. Also, a lexical similarity-based approach for extracting subjectivity in documents extracted from social media was proposed in [67].…”

Section: Sentiment Analysis Toolsmentioning

confidence: 99%

Time-aware domain-based social influence prediction

et al. 2020

View full text Add to dashboard Cite

Online Social Networks(OSNs) have established virtual platforms enabling people to express their opinions, interests and thoughts in a variety of contexts and domains, allowing legitimate users as well as spammers and other untrustworthy users to publish and spread their content. Hence, the concept of social trust has attracted an attention of information processors/data scientists and information consumers / business firms. One of the main reasons for acquiring the value of Social Big Data (SBD) is to provide frameworks and methodologies using which the credibility of OSNs users can be evaluated. These approaches should be scalable to accommodate large-scale social data. Hence, there is a need for well comprehending of social trust to improve and expand the analysis process and inferring credibility of SBD. Given the exposed environment's settings and fewer limitations related to OSNs, the medium allows legitimate and genuine users as well as spammers and other low trustworthy users to publish and spread their content. Hence, this paper presents an approach incorporates semantic analysis and machine learning modules to measure and predict users' trustworthiness in numerous domains in different time periods. The evaluation of the conducted experiment validates the applicability of the incorporated machine learning techniques to predict highly trustworthy domain-based users.

show abstract

“…The word-embedding method has successfully identified the semantic distances between two sentences better than the traditional approach for text similarity (e.g., the distance of the tf-idf vector) [18]. In this research, we used two wordembedding methods to calculate the semantic distance between Eng2Ind Translation and Ind Caption:…”

Section: Semantic Embeddingsmentioning

confidence: 99%

Corpus Construction and Semantic Analysis of Indonesian Image Description

Nur’Aini¹,

Effendi

Sakti

et al. 2018

6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU 2018)

View full text Add to dashboard Cite

Understanding language grounded in visual content is a challenging problem that has raised interest in both the computer vision and natural language processing communities. Flickr30k, which is one of the corpora that have become a standard benchmark to study sentence-based image description, was initially limited to English descriptions, but it has been extended to German, French, and Czech. This paper describes our construction of an image description dataset in the Indonesian language. We translated English descriptions from the Flickr30K dataset into Indonesian with automatic machine translation and performed human validation for the portion of the result. We then constructed Indonesian image descriptions of 10k images by crowdsourcing without English descriptions or translations, and found semantic differences between translations and descriptions. We conclude that the cultural differences between the native speakers of English and Indonesian create different perceptions for constructing natural language expressions that describe an image.

show abstract

Learning Semantic Similarity for Very Short Texts

Cited by 60 publications

References 5 publications

Learning semantic sentence representations from visually grounded language without lexical knowledge

Learning semantic sentence representations from visually grounded language without lexical knowledge

Time-aware domain-based social influence prediction

Corpus Construction and Semantic Analysis of Indonesian Image Description

Contact Info

Product

Resources

About