Don Tuggener scite author profile

Don Tuggener

32Publications

59Citation Statements Received

229Citation Statements Given

How they've been cited

How they cite others

396

226

Affiliations

ZHAW Zurich University of Applied Sciences, University of Zurich

Publications

Order By: Most citations

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Deriu

Tuggener

Däniken³

et al. 2020

View full text Add to dashboard Cite

The lack of time-efficient and reliable evaluation methods hamper the development of conversational dialogue systems (chatbots). Evaluations requiring humans to converse with chatbots are time and cost-intensive, put high cognitive demands on the human judges, and yield low-quality results. In this work, we introduce Spot The Bot, a cost-efficient and robust evaluation framework that replaces human-bot conversations with conversations between bots. Human judges then only annotate for each entity in a conversation whether they think it is human or not (assuming there are humans participants in these conversations). These annotations then allow us to rank chatbots regarding their ability to mimic the conversational behavior of humans. Since we expect that all bots are eventually recognized as such, we incorporate a metric that measures which chatbot can uphold human-like behavior the longest, i.e., Survival Analysis. This metric has the ability to correlate a bot's performance to certain of its characteristics (e.g., fluency or sensibleness), yielding interpretable results. The comparably low cost of our framework allows for frequent evaluations of chatbots during their evaluation cycle. We empirically validate our claims by applying Spot The Bot to three domains, evaluating several stateof-the-art chatbots, and drawing comparisons to related work. The framework is released as a ready-to-use tool.

show abstract

Coreference Resolution Evaluation for Higher Level Applications

Tuggener¹

2014

View full text Add to dashboard Cite

This paper presents an evaluation framework for coreference resolution geared towards interpretability for higher-level applications. Three application scenarios for coreference resolution are outlined and metrics for them are devised. The metrics provide detailed system analysis and aim at measuring the potential benefit of using coreference systems in preprocessing. AbstractThis paper presents an evaluation framework for coreference resolution geared towards interpretability for higher-level applications. Three application scenarios for coreference resolution are outlined and metrics for them are devised. The metrics provide detailed system analysis and aim at measuring the potential benefit of using coreference systems in preprocessing.

show abstract

Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities

Luong¹,

Popescu-Belis²,

Gonzales

et al. 2017

View full text Add to dashboard Cite

We implement a fully probabilistic model to combine the hypotheses of a Spanish anaphora resolution system with those of a Spanish-English machine translation system. The probabilities over antecedents are converted into probabilities for the features of translated pronouns, and are integrated with phrase-based MT using an additional translation model for pronouns. The system improves the translation of several Spanish personal and possessive pronouns into English, by solving translation divergencies such as ella → she | it or su → his | her | its | their. On a test set with 2,286 pronouns, a baseline system correctly translates 1,055 of them, while ours improves this by 41. Moreover, with oracle antecedents, possessives are translated with an accuracy of 83%.

show abstract

Stance Detection in Facebook Posts of a German Right-wing Party

Klenner¹,

Tuggener²,

Clematide³

2017

View full text Add to dashboard Cite

We argue that in order to detect stance, not only the explicit attitudes of the stance holder towards the targets are crucial. It is the whole narrative the writer drafts that counts, including the way he hypostasizes the discourse referents: as benefactors or villains, as victims or beneficiaries. We exemplify the ability of our system to identify targets and detect the writer's stance towards them on the basis of about 100 000 Facebook posts of a German right-wing party. A reader and writer model on top of our verb-based attitude extraction directly reveal stance conflicts.

show abstract

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Deriu¹,

Tuggener²,

Däniken³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

Behavioural Simulator for Professional Training Based on Natural Language Interaction

Mazza¹,

Ambrosini²,

Catenazzi³

et al. 2018

View full text Add to dashboard Cite

The virtual patient is an online simulation system designed to train and assess relational and clinical abilities in a realistic interactive problem-based learning scenario, where users (medical students) can interact and communicate with characters specifically designed to challenge their clinical and relational skills and facilitate the generation of learning objectives. In this paper we will present an enhancement to the system by simulating a normal interview with real users through natural language, thus enabling users to behave more naturally without keyboards or other input devices. We evaluated the system with a sample of users and found that the new voice-based interaction is user-friendly and facilitates user acceptance. However, a number of limitations remain to be addressed to get the system ready for a large-scale deployment.

show abstract

The Sentence End and Punctuation Prediction in NLG text (SEPP-NLG) shared task 2021

Tuggener¹,

Aghaebrahimian²

2021

View full text Add to dashboard Cite

Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

Deriu¹,

Tuggener²,

Däniken³

et al. 2022

Preprint

View full text Add to dashboard Cite

This paper introduces an adversarial method to stress-test trained metrics to evaluate conversational dialogue systems. The method leverages Reinforcement Learning to find response strategies that elicit optimal scores from the trained metrics. We apply our method to test recently proposed trained metrics. We find that they all are susceptible to giving high scores to responses generated by relatively simple and obviously flawed strategies that our method converges on. For instance, simply copying parts of the conversation context to form a response yields competitive scores or even outperforms responses written by humans.

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Don Tuggener

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Coreference Resolution Evaluation for Higher Level Applications

Machine Translation of Spanish Personal and Possessive Pronouns Using Anaphora Probabilities

Stance Detection in Facebook Posts of a German Right-wing Party

Spot The Bot: A Robust and Efficient Framework for the Evaluation of Conversational Dialogue Systems

Behavioural Simulator for Professional Training Based on Natural Language Interaction

The Sentence End and Punctuation Prediction in NLG text (SEPP-NLG) shared task 2021

Probing the Robustness of Trained Metrics for Conversational Dialogue Systems

Contact Info

Product

Resources

About