A Semantic and Syntactic Similarity Measure for Political Tweets

Little, Claire; McLean, David; Crockett, Keeley; Edmonds, Bruce

doi:10.1109/access.2020.3017797

Cited by 7 publications

(5 citation statements)

References 32 publications

(60 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This ranges from traditional methods such as TFIDF or Jaccard similarity to modern approaches including BERT embeddings (Devlin et al, 2019;Reimers and Gurevych, 2019;Zhang et al, 2019b) and AMR kernels (Opitz et al, 2021). Some approaches use syntactic features specific to certain domains such as Twitter (Alnajran, 2019;Little et al, 2020) or web documents (Broder et al, 1997;Pereira and Ziviani, 2003). Other metrics include "syntactic elements" which take on various forms of parts-of-speech aggregation (Alnajran, 2019;Pakray et al, 2011).…”

Section: Related Workmentioning

confidence: 99%

“…An important advantage of CASSIM is that it is generalizable to any corpus; it does not represent syntax using platform-specific features like Alnajran (2019); Little et al (2020). However, the cost of exhaustively using a metric such as Edit Distance is rather penalizing, as its implementations range in asymptotic time complexity from Θ(mn) (Wagner and Fischer, 1974) nen, 1985), where m and n are the string lengths, and s is the maximal Edit Distance.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Chen,

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance-and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance-and document-level syntactic similarity which pairs and averages the most similar constituency parse trees between a pair of documents based on tree kernels. FastKAS-SIM is more robust to syntactic dissimilarities and runs up to to 5.32 times faster than its predecessor over documents in the r/ChangeMyView corpus. FastKASSIM's improvements allow us to examine hypotheses in two settings with large documents. We find that syntactically similar arguments on r/ChangeMyView tend to be more persuasive, and that syntax is predictive of authorship attribution in the Australian High Court Judgment corpus. * denotes equal contribution.Utterance 1: When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless.Utterance 2: How can you be skiing if you are already swimming?FastKASSIM Score: 0.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Chen,

et al. 2023

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Alnajran (2019) proposed TREA-SURE, which primarily captured semantic similarity using word embeddings, but also considered syntax in the form of coarse metrics such as counting parts-of-speech and features specific to the social media platform Twitter. Similarly, Little et al (2020) also computed a metric for semantic and syntactic similarity specific to Twitter. They took a weighted average between BERT embeddings and their "syntactic element," which consists of the length of the longest common sequences of partsof-speech, and a feature vector counting the number of hashtags, mentions, and pronouns.…”

Section: Related Workmentioning

confidence: 99%

“…One of the most imporant advantages of CAS-SIM is that it is generalizable to any corpus, as it doesn't require platform-specific features like Alnajran (2019); Little et al (2020). However, because the metric is exhaustive, the cost of an algorithm such as Edit Distance becomes rather penalizing -the common Edit Distance algorithms range in time complexity from Θ(mn) (Wagner and Fischer, 1974) to O(s×min(m, n)) (Ukkonen, 1985), where m and n are the string lengths, and s is the maximal Edit Distance.…”

Section: Fastkassimmentioning

confidence: 99%

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Chen¹,

Chen²,

Zhou³

2022

Preprint

View full text Add to dashboard Cite

Syntax is a fundamental component of language, yet few metrics have been employed to capture syntactic similarity or coherence at the utterance-and document-level. The existing standard document-level syntactic similarity metric is computationally expensive and performs inconsistently when faced with syntactically dissimilar documents. To address these challenges, we present FastKASSIM, a metric for utterance-and document-level syntactic similarity which pairs and averages the most similar dependency parse trees between a pair of documents based on tree kernels. FastKAS-SIM is more robust to syntactic dissimilarities and differences in length, and runs up to to 5.2 times faster than our baseline method over the documents in the r/ChangeMyView corpus. * denotes equal contribution.Utterance 1: When we hate, we always move away from the grace of God. When we become resentful and unforgiving, the world around us seems spiteful and meaningless. Utterance 2: How can you be skiing if you are already swimming? FastKASSIM Score: 0.219 CASSIM Score: 0.838 LSM Score: 0.623 Utterance 1: I like swimming because it is cool. Utterance 2: I love running because it is fun.

show abstract

“…Similarity measures combine semantic and syntactic features of natural language to determine a similarity measure of two short texts. Short texts are typically 25 words or less in length [1] and include structured (sentences) and unstructured (tweets) [2,3,4,5]. Substantial research has been undertaken in the field of traditional semantic similarity [6], with methods typically grouped into corpus-based [7], string-based [8], knowledge-based [9], and hybrid [1].…”

Section: Introductionmentioning

confidence: 99%

Fuzzy Influence in Fuzzy Semantic Similarity Measures

Adel

Crockett

Carvalho

et al. 2021

2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)

Self Cite

View full text Add to dashboard Cite

The field of Computing with Words has been pivotal in the development of fuzzy semantic similarity measures. Fuzzy semantic similarity measures allow the modelling of words in a given context with a tolerance for the imprecise nature of human perceptions. In this work, we look at how this imprecision can be addressed with the use of fuzzy semantic similarity measures in the field of natural language processing. A fuzzy influence factor is introduced into an existing measure known as FUSE. FUSE computes the similarity between two short texts based on weighted syntactic and semantic components in order to address the issue of comparing fuzzy words that exist in different word categories. A series of empirical experiments investigates the effect of introducing a fuzzy influence factor into FUSE across a number of short text datasets. Comparisons with other similarity measures demonstrates that the fuzzy influence factor has a positive effect in improving the correlation of machine similarity judgments with similarity judgments of humans.

show abstract

A Semantic and Syntactic Similarity Measure for Political Tweets

Cited by 7 publications

References 32 publications

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

FastKASSIM: A Fast Tree Kernel-Based Syntactic Similarity Metric

Fuzzy Influence in Fuzzy Semantic Similarity Measures

Contact Info

Product

Resources

About