The Case for Latent Variable Vs Deep Learning Methods in Misinformation Detection: An Application to COVID-19

Moroney, Caitlin; Crothers, Evan; Mittal, Sudip; Joshi, Anupam; Mallinson, Christine; Japkowicz, Nathalie; Boukouvalas, Zois

doi:10.1007/978-3-030-88942-5_33

Cited by 7 publications

(2 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…By developing human-based evaluation metrics, we will not only assess the document embedding space, but more importantly, we will be able to identify potential biases related to certain characteristics of the collected abstracts enabling us to correct our model before it is deployed at scale. In addition, comparing BERT with other popular latent variable methods as presented in [10], would be of high interest. Finally, in terms of a computational chemistry perspective, the development of validation techniques for the extracted document embeddings and how they can be used for the discovery of energetic materials and systems is a significant research direction that deserves further investigation.…”

Section: Discussionmentioning

confidence: 99%

Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora

Puerto¹,

Kellett²,

Nikopoulou³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

As the amount and variety of energetics research increases, machine aware topic identification is necessary to streamline future research pipelines. The makeup of an automatic topic identification process consists of creating document representations and performing classification. However, the implementation of these processes on energetics research imposes new challenges. Energetics datasets contain many scientific terms that are necessary to understand the context of a document but may require more complex document representations. Secondly, the predictions from classification must be understandable and trusted by the chemists within the pipeline. In this work, we study the trade-off between prediction accuracy and interpretability by implementing three document embedding methods that vary in computational complexity. With our accuracy results, we also introduce local interpretability model-agnostic explanations (LIME) of each prediction to provide a localized understanding of each prediction and to validate classifier decisions with our team of energetics experts. This study was carried out on a novel labeled energetics dataset created and validated by our team of energetics experts.

show abstract

Section: Discussionmentioning

confidence: 99%

Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora

Puerto¹,

Kellett²,

Nikopoulou³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

“…The set of unique n-grams, iterations over the set, and count of the number of times each n-gram appears in the text are extracted to compute the unigram, bi-gram, and tri-gram overlap in order to quantify repetitiveness properties. The overlap is then calculated as a ratio between the count and the total number of different n-grams [38]. Additionally, the frequency of the words in the data is compared to the top 5K and 10K words in each language [24], thus determining how closely the lexicon in the dataset matches that of everyday speech.…”

mentioning

confidence: 99%

One-Class Learning for AI-Generated Essay Detection

Corizzo,

Leal-Arenas

2023

Applied Sciences

View full text Add to dashboard Cite

Detection of AI-generated content is a crucially important task considering the increasing attention towards AI tools, such as ChatGPT, and the raised concerns with regard to academic integrity. Existing text classification approaches, including neural-network-based and feature-based methods, are mostly tailored for English data, and they are typically limited to a supervised learning setting. Although one-class learning methods are more suitable for classification tasks, their effectiveness in essay detection is still unknown. In this paper, this gap is explored by adopting linguistic features and one-class learning models for AI-generated essay detection. Detection performance of different models is assessed in different settings, where positively labeled data, i.e., AI-generated essays, are unavailable for model training. Results with two datasets containing essays in L2 English and L2 Spanish show that it is feasible to accurately detect AI-generated essays. The analysis reveals which models and which sets of linguistic features are more powerful than others in the detection task.

show abstract

Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events

et al. 2022

View full text Add to dashboard Cite

The Case for Latent Variable Vs Deep Learning Methods in Misinformation Detection: An Application to COVID-19

Cited by 7 publications

References 15 publications

Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora

Assessing the trade-off between prediction accuracy and interpretability for topic modeling on energetic materials corpora

One-Class Learning for AI-Generated Essay Detection

Efficient Multivariate Data Fusion for Misinformation Detection During High Impact Events

Contact Info

Product

Resources

About