Sarik Ghazarian scite author profile

Despite advances in open-domain dialogue systems, automatic evaluation of such systems is still a challenging problem. Traditional reference-based metrics such as BLEU are ineffective because there could be many valid responses for a given context that share no common words with reference responses. A recent work proposed Referenced metric and Unreferenced metric Blended Evaluation Routine (RUBER) to combine a learning-based metric, which predicts relatedness between a generated response and a given query, with reference-based metric; it showed high correlation with human judgments. In this paper, we explore using contextualized word embeddings to compute more accurate relatedness scores, thus better evaluation metrics. Experiments show that our evaluation metrics outperform RUBER, which is trained on static embeddings.

show abstract

Enhancing memory-based collaborative filtering for group recommender systems

Ghazarian

Nematbakhsh

2015

Expert Systems with Applications

104

View full text Add to dashboard Cite

Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems

Ghazarian¹,

Weischedel²,

Galstyan³

et al. 2020

AAAI

View full text Add to dashboard Cite

User engagement is a critical metric for evaluating the quality of open-domain dialogue systems. Prior work has focused on conversation-level engagement by using heuristically constructed features such as the number of turns and the total time of the conversation. In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, predictive engagement, for automatic evaluation of open-domain dialogue systems. Our experiments demonstrate that (1) human annotators have high agreement on assessing utterance-level engagement scores; (2) conversation-level engagement scores can be predicted from properly aggregated utterance-level engagement scores. Furthermore, we show that the utterance-level engagement scores can be learned from data. These scores can be incorporated into automatic evaluation metrics for open-domain dialogue systems to improve the correlation with human judgements. This suggests that predictive engagement can be used as a real-time feedback for training better dialogue models.

show abstract

Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings

Ghazarian¹,

Wei

Galstyan³

et al. 2019

Preprint

View full text Add to dashboard Cite

Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation

Ghazarian¹,

Liu²,

Akash³

et al. 2021

View full text Add to dashboard Cite

Human Written Story: jenny liked fresh fish. she decided to go fishing to catch her own. she brought her worms and pole and a chair. she sat there all day but didn't catch anything. she packed it up and went home disappointed. Sentence Manipulation: jenny liked fresh fish. she decided to go fishing to catch her own. she wrote songs every single day. she sat there all day but didn't catch anything. she packed it up and went home disappointed. Keyword Manipulation: jenny liked fresh fish. she decided to go fishing to catch her own. she brought her worms and pole and a chair. she sat there all day but didn't catch anything. she unpacked it up and went home disappointed. UNION: jenny liked fresh fish. jim has a very structured workout program to help him achieve goals. she brought her worms and pole and a relaxer. she sat there all day but didn't catch anything. she unpack it up and went home disappointed.Plot: jenny fresh fish -> decided Manipulated Plot: jenny fresh fish -> tasha fishing catch -> brought worms chair offered woman store -> brought worms chair -> -> sat -> packed home disappointed sat -> got wet packed home disappointed Manipulated Plot Guided Generation (Ours): jenny was out of fresh fish. tasha offered to buy her some from the woman at the store. she brought her worms and a chair and decided to play with them. jenny sat down and laid down on the chair. when she got wet, she packed up and went home disappointed.

show abstract

ParsiNLU: A Suite of Language Understanding Challenges for Persian

Khashabi

Cohan

Shakeri

et al. 2021

View full text Add to dashboard Cite

Despite the progress made in recent years in addressing natural language understanding (NLU) challenges, the majority of this progress remains to be concentrated on resource-rich languages like English. This work focuses on Persian language, one of the widely spoken languages in the world, and yet there are few NLU datasets available for this language. The availability of high-quality evaluation datasets is a necessity for reliable assessment of the progress on different NLU tasks and domains. We introduce ParsiNLU, the first benchmark in Persian language that includes a range of language understanding tasks—reading comprehension, textual entailment, and so on. These datasets are collected in a multitude of ways, often involving manual annotations by native speakers. This results in over 14.5k new instances across 6 distinct NLU tasks. Additionally, we present the first results on state-of-the-art monolingual and multilingual pre-trained language models on this benchmark and compare them with human performance, which provides valuable insights into our ability to tackle natural language understanding challenges in Persian. We hope ParsiNLU fosters further research and advances in Persian language understanding.1

show abstract

DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

Ghazarian¹,

Wen²,

Galstyan³

et al. 2022

View full text Add to dashboard Cite

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Ghazarian¹,

Hedayatnia²,

Papangelis³

et al. 2022

Preprint

View full text Add to dashboard Cite

Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written opendomain dialog corpora collected from real and paid users.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Sarik Ghazarian

Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings

Enhancing memory-based collaborative filtering for group recommender systems

Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems

Better Automatic Evaluation of Open-Domain Dialogue Systems with Contextualized Embeddings

Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation

ParsiNLU: A Suite of Language Understanding Challenges for Persian

DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

Contact Info

Product

Resources

About