Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Schuff, Hendrik; Yang, Hsiu-Yu; Adel, Heike; Vu, Ngoc Thang

doi:10.18653/v1/2021.blackboxnlp-1.3

Cited by 7 publications

(2 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…WordNet, ConceptNet, etc.) in order to improve the explainability of Natural Language Inference (NLI) models [22]. Here we follow a similar approach, carrying out an initial study on the use of language models, particularly GPT-3, as a external source to generate explanations of musical decisions.…”

Section: Transformer-based Approaches In Explainable Aimentioning

confidence: 99%

Towards the Generation of Musical Explanations with GPT-3

James

Llano

McCormack

2022

Artificial Intelligence in Music, Sound, Art and Design

View full text Add to dashboard Cite

Open AI's language model, GPT-3, has shown great potential for many NLP tasks, with applications in many different domains. In this work we carry out a first study on GPT-3's capability to communicate musical decisions through textual explanations when prompted with a textual representation of a piece of music. Enabling a dialogue in human-AI music partnerships is an important step towards more engaging and creative human-AI interactions. Our results show that GPT-3 lacks the necessary intelligence to really understand musical decisions. A major barrier to reach a better performance is the lack of data that includes explanations of the creative process carried out by artists for musical pieces. We believe such a resource would aid the understanding and collaboration with AI music systems.

show abstract

Section: Transformer-based Approaches In Explainable Aimentioning

confidence: 99%

Towards the Generation of Musical Explanations with GPT-3

James

Llano

McCormack

2022

Artificial Intelligence in Music, Sound, Art and Design

View full text Add to dashboard Cite

show abstract

“…Both candidates receive identical BLEU-2 scores; however from a human perspective, sentence (a) seems to much better reflect the original German sentence. a Similarly, automatic evaluation measures used by other NLP tasks face the same problem (Callison-Burch et al 2006;Liu et al 2016;Mathur et al 2020;Schuff et al 2020Schuff et al , 2021Iskender et al 2020;Clinciu et al 2021). Therefore, human evaluation has begun to gain more and more attention in the NLP community (especially in the context of natural language generation tasks, including machine translation Belz and Reiter 2006;Novikova, Dusek, and Rieser 2018;van der Lee et al 2019).…”

Section: Introductionmentioning

confidence: 99%

How to do human evaluation: A brief introduction to user studies in NLP

Schuff

Vanderlyn²,

Adel³

et al. 2023

Nat. Lang. Eng.

Self Cite

View full text Add to dashboard Cite

Many research topics in natural language processing (NLP), such as explanation generation, dialog modeling, or machine translation, require evaluation that goes beyond standard metrics like accuracy or F1 score toward a more human-centered approach. Therefore, understanding how to design user studies becomes increasingly important. However, few comprehensive resources exist on planning, conducting, and evaluating user studies for NLP, making it hard to get started for researchers without prior experience in the field of human evaluation. In this paper, we summarize the most important aspects of user studies and their design and evaluation, providing direct links to NLP tasks and NLP-specific challenges where appropriate. We (i) outline general study design, ethical considerations, and factors to consider for crowdsourcing, (ii) discuss the particularities of user studies in NLP, and provide starting points to select questionnaires, experimental designs, and evaluation methods that are tailored to the specific NLP tasks. Additionally, we offer examples with accompanying statistical evaluation code, to bridge the gap between theoretical guidelines and practical applications.

show abstract

Generating knowledge aware explanation for natural language inference

Yang

et al. 2023

Information Processing & Management

View full text Add to dashboard Cite

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

Cited by 7 publications

References 31 publications

Towards the Generation of Musical Explanations with GPT-3

Towards the Generation of Musical Explanations with GPT-3

How to do human evaluation: A brief introduction to user studies in NLP

Generating knowledge aware explanation for natural language inference

Contact Info

Product

Resources

About