Artificial Text Detection via Examining the Topology of Attention Maps

Kushnareva, Laida; Cherniavskii, Daniil; Mikhailov, Vladislav; Artemova, Ekaterina; Barannikov, Serguei; Bernstein, Alexander; Piontkovskaya, Irina; Piontkovski, Dmitri; Burnaev, Evgeny

doi:10.18653/v1/2021.emnlp-main.50

Cited by 11 publications

(12 citation statements)

References 23 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…[18][19][20][21] TDA has been used in NLP for movie genre detection, 22 textual entailment, 23 document summarization, 24 and analysis of sentence embeddings. 25 The topology of the attention layers has been leveraged for text classification, 26,27 acceptability judgments, 28 and robustness against adversarial attacks. 29…”

Section: Related Workmentioning

confidence: 99%

TopoBERT: Exploring the topology of fine-tuned word representations

Rathore

Zhou

Srikumar

et al. 2023

Information Visualization

View full text Add to dashboard Cite

Transformer-based language models such as BERT and its variants have found widespread use in natural language processing (NLP). A common way of using these models is to fine-tune them to improve their performance on a specific task. However, it is currently unclear how the fine-tuning process affects the underlying structure of the word embeddings from these models. We present TopoBERT, a visual analytics system for interactively exploring the fine-tuning process of various transformer-based models – across multiple fine-tuning batch updates, subsequent layers of the model, and different NLP tasks – from a topological perspective. The system uses the mapper algorithm from topological data analysis (TDA) to generate a graph that approximates the shape of a model’s embedding space for an input dataset. TopoBERT enables its users (e.g. experts in NLP and linguistics) to (1) interactively explore the fine-tuning process across different model-task pairs, (2) visualize the shape of embedding spaces at multiple scales and layers, and (3) connect linguistic and contextual information about the input dataset with the topology of the embedding space. Using TopoBERT, we provide various use cases to exemplify its applications in exploring fine-tuned word embeddings. We further demonstrate the utility of TopoBERT, which enables users to generate insights about the fine-tuning process and provides support for empirical validation of these insights.

show abstract

Section: Related Workmentioning

confidence: 99%

TopoBERT: Exploring the topology of fine-tuned word representations

Rathore

Zhou

Srikumar

et al. 2023

Information Visualization

View full text Add to dashboard Cite

show abstract

“…Given an input text, we extract output attention matrices from Transformer LMs and follow Kushnareva et al, 2021 to compute three types of persistent features over them.…”

Section: Extracted Featuresmentioning

confidence: 99%

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

2023

View full text Add to dashboard Cite

show abstract

“…Their best performing model utilizes persistence features derived from time-delay embeddings of term frequency data. Kushnareva et al (2021) compute persistent homology of a filtered graph constructed from the attention maps of a pretrained language model and harness the features for an artificial text detection task.…”

Section: Related Workmentioning

confidence: 99%

Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Renato¹,

Heck²,

Ruppik³

et al. 2022

View full text Add to dashboard Cite

Goal oriented dialogue systems were originally designed as a natural language interface to a fixed data-set of entities that users might inquire about, further described by domain, slots and values. As we move towards adaptable dialogue systems where knowledge about domains, slots and values may change, there is an increasing need to automatically extract these terms from raw dialogues or related nondialogue data on a large scale. In this paper, we take an important step in this direction by exploring different features that can enable systems to discover realizations of domains, slots and values in dialogues in a purely data-driven fashion. The features that we examine stem from word embeddings, language modelling features, as well as topological features of the word embedding space. To examine the utility of each feature set, we train a seed model based on the widely used Multi-WOZ data-set. Then, we apply this model to a different corpus, the Schema-Guided Dialogue data-set. Our method outperforms the previously proposed approach that relies solely on word embeddings. We also demonstrate that each of the features is responsible for discovering different kinds of content. We believe our results warrant further research towards ontology induction, and continued harnessing of topological data analysis for dialogue and natural language processing research.

show abstract

Artificial Text Detection via Examining the Topology of Attention Maps

Cited by 11 publications

References 23 publications

TopoBERT: Exploring the topology of fine-tuned word representations

TopoBERT: Exploring the topology of fine-tuned word representations

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

Dialogue Term Extraction using Transfer Learning and Topological Data Analysis

Contact Info

Product

Resources

About