Short text similarity measurement methods: a review

Prakoso, Dimas Wibisono; Abdi, Asad; Amrit, Chintan

doi:10.1007/s00500-020-05479-2

Cited by 36 publications

(15 citation statements)

References 55 publications

(72 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Several methods to compare documents include word-based, keyword-based, n-gram-based, and Latent Semantic Analysis-based methodologies (see [16]). [17], on their review specifically about short text similarity (STS) tasks, broadens this classification to a more generic overview, to which he classifies the tasks as string-based, corpus-based, knowledge-based, and hybrid-based. Our work will use a hybrid approach (corpus-based and string-based), using the work of [14] as a reference method.…”

Section: Tweet Similarity Approach 41 Overviewmentioning

confidence: 99%

A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model

Bezerra,

Kozierkiewicz,

Pietranik

2024

Preprint

View full text Add to dashboard Cite

In this article, we address two problems: the detection of fake news using a multimodal, content, and context-driven approach and the evaluation of short-text messages (in our case, tweets) to compare how similar or different they are. For the first problem, we developed a framework for detecting fake news using a multilayered, multimodal approach consisting of a three-tiered model: the topic layer, the social layer, and the context layer, helping to establish a methodology for detecting fake news. Within the topic layer, one of its tasks is the calculation of how similar two messages are to each other. We developed an improved version of an existing model, adapted it to our framework, and included calculating certain words and their positions as features and a novel embedding method using Cosine Similarity and POS (Parts of Speech) tags. We used the dataset from STSBenchmark and the correlation value to measure the quality of our model and performed a 50-fold evaluation over the validation data. Ultimately, our model has a better correlation value (median 0.673528) than the benchmarked model. Our main contribution is the delivery of a reliable, formal, and adaptable framework for identifying fake news; we also present a means of comparing tweets by their content, utilizing FastText models, cosine similarity, and other measures, making our contribution practical and effective.

show abstract

Section: Tweet Similarity Approach 41 Overviewmentioning

confidence: 99%

A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model

Bezerra,

Kozierkiewicz,

Pietranik

2024

Preprint

View full text Add to dashboard Cite

show abstract

“…A number of NLP methods, e.g. adapted from various text similarity measures, can then be employed to compare the reconstructed QUDs to the overt question to see whether and where they overlap (Croft et al, 2013;Prasetya et al, 2018;Prakoso et al, 2021). Evasive bullshitting by way of introducing novel QUDs (and pretending they answer the original one) occurs when the topic of question and answer matches, but the implicit QUDs differ strongly from the overt ones.…”

Section: Nlp-based Detection Of Persuasive Bullshitmentioning

confidence: 99%

Bullshit, Pragmatic Deception, and Natural Language Processing

Deck

2023

dad

View full text Add to dashboard Cite

Fact checking and fake news detection has garnered increasing interest within the natural language processing (NLP) community in recent years, yet other aspects of misinformation remain unexplored. One such phenomenon is `bullshit', which different disciplines have tried to define since it first entered academic discussion nearly four decades ago. Fact checking bullshitters is useless, because factual reality typically plays no part in their assertions: Where liars deceive about content, bullshitters deceive about their goals. Bullshitting is misleading about language itself, which necessitates identifying the points at which pragmatic conventions are broken with deceptive intent. This paper aims to introduce bullshitology into the field of NLP by tying it to questions in a QUD-based definition, providing two approaches to bullshit annotation, and finally outlining which combinations of NLP methods will be helpful to classify which kinds of linguistic bullshit.

show abstract

“…So, NLP is based on textual similarity evaluation. [ 1 ] showed and discussed the popular methods used in evaluating short text similarity, such that they examined and compared each other and demonstrated how the approaches used in short text similarity evaluation changed over time. While some researchers, such as the work of [ 2 ], used several words overlapping methods to count the similar words between the learner’s answer and the model answer, this study concluded that the algorithm of the overlapping word cannot overcome the semantic similarity problems, such as some students can express the correct answer in words different than the model answer keywords.…”

Section: Introductionmentioning

confidence: 99%

Automatic grading for Arabic short answer questions using optimized deep learning model

Salam

El-Fatah

Hassan³

2022

PLoS ONE

View full text Add to dashboard Cite

Auto-grading of short answer questions is considered a challenging problem in the processing of natural language. It requires a system to comprehend the free text answers to automatically assign a grade for a student answer compared to one or more model answers. This paper suggests an optimized deep learning model for grading short-answer questions automatically by using various sizes of datasets collected in the Science subject for students in seventh grade in Egypt. The proposed system is a hybrid approach that optimizes a deep learning technique called LSTM (Long Short Term Memory) with a recent optimization algorithm called a Grey Wolf Optimizer (GWO). The GWO is employed to optimize the LSTM by selecting the best dropout and recurrent dropout rates of LSTM hyperparameters rather than manual choice. Using GWO makes the LSTM model more generalized and can also avoid the problem of overfitting in forecasting the students’ scores to improve the learning process and save instructors’ time and effort. The model’s performance is measured in terms of the Root Mean Squared Error (RMSE), the Pearson correlation coefficient, and R-Square. According to the simulation results, the hybrid GWO with the LSTM model ensured the best performance and outperformed the classical LSTM model and other compared models such that it had the highest Pearson correlation coefficient value, the lowest RMSE value, and the best R square value in all experiments, but higher training time than the traditional deep learning model.

show abstract

Short text similarity measurement methods: a review

Cited by 36 publications

References 55 publications

A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model

A Novel Approach for Tweet Similarity in a Context-Aware Fake News Detection Model

Bullshit, Pragmatic Deception, and Natural Language Processing

Automatic grading for Arabic short answer questions using optimized deep learning model

Contact Info

Product

Resources

About