2020
DOI: 10.1109/access.2020.3017797
|View full text |Cite
|
Sign up to set email alerts
|

A Semantic and Syntactic Similarity Measure for Political Tweets

Abstract: Measurement of the semantic and syntactic similarity of human utterances is essential in allowing machines to understand dialogue with users. However, human language is complex, and the semantic meaning of an utterance is usually dependent upon the context at a given time and learnt experience of the meaning of the words that are used. This is particularly challenging when automatically understanding the meaning of social media, such as tweets, which can contain non-standard language. Short Text Semantic Simil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 32 publications
(60 reference statements)
0
3
0
Order By: Relevance
“…This ranges from traditional methods such as TFIDF or Jaccard similarity to modern approaches including BERT embeddings (Devlin et al, 2019;Reimers and Gurevych, 2019;Zhang et al, 2019b) and AMR kernels (Opitz et al, 2021). Some approaches use syntactic features specific to certain domains such as Twitter (Alnajran, 2019;Little et al, 2020) or web documents (Broder et al, 1997;Pereira and Ziviani, 2003). Other metrics include "syntactic elements" which take on various forms of parts-of-speech aggregation (Alnajran, 2019;Pakray et al, 2011).…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…This ranges from traditional methods such as TFIDF or Jaccard similarity to modern approaches including BERT embeddings (Devlin et al, 2019;Reimers and Gurevych, 2019;Zhang et al, 2019b) and AMR kernels (Opitz et al, 2021). Some approaches use syntactic features specific to certain domains such as Twitter (Alnajran, 2019;Little et al, 2020) or web documents (Broder et al, 1997;Pereira and Ziviani, 2003). Other metrics include "syntactic elements" which take on various forms of parts-of-speech aggregation (Alnajran, 2019;Pakray et al, 2011).…”
Section: Related Workmentioning
confidence: 99%
“…An important advantage of CASSIM is that it is generalizable to any corpus; it does not represent syntax using platform-specific features like Alnajran (2019); Little et al (2020). However, the cost of exhaustively using a metric such as Edit Distance is rather penalizing, as its implementations range in asymptotic time complexity from Θ(mn) (Wagner and Fischer, 1974) nen, 1985), where m and n are the string lengths, and s is the maximal Edit Distance.…”
Section: Related Workmentioning
confidence: 99%
“…Alnajran (2019) proposed TREA-SURE, which primarily captured semantic similarity using word embeddings, but also considered syntax in the form of coarse metrics such as counting parts-of-speech and features specific to the social media platform Twitter. Similarly, Little et al (2020) also computed a metric for semantic and syntactic similarity specific to Twitter. They took a weighted average between BERT embeddings and their "syntactic element," which consists of the length of the longest common sequences of partsof-speech, and a feature vector counting the number of hashtags, mentions, and pronouns.…”
Section: Related Workmentioning
confidence: 99%
“…One of the most imporant advantages of CAS-SIM is that it is generalizable to any corpus, as it doesn't require platform-specific features like Alnajran (2019); Little et al (2020). However, because the metric is exhaustive, the cost of an algorithm such as Edit Distance becomes rather penalizing -the common Edit Distance algorithms range in time complexity from Θ(mn) (Wagner and Fischer, 1974) to O(s×min(m, n)) (Ukkonen, 1985), where m and n are the string lengths, and s is the maximal Edit Distance.…”
Section: Fastkassimmentioning
confidence: 99%
“…Similarity measures combine semantic and syntactic features of natural language to determine a similarity measure of two short texts. Short texts are typically 25 words or less in length [1] and include structured (sentences) and unstructured (tweets) [2,3,4,5]. Substantial research has been undertaken in the field of traditional semantic similarity [6], with methods typically grouped into corpus-based [7], string-based [8], knowledge-based [9], and hybrid [1].…”
Section: Introductionmentioning
confidence: 99%