Paul Piwek scite author profile

In this paper we examine the differences in use between distal and proximal demonstrative terms (e.g., singular "this" and "that", and plural "these" and "those" in English). The proximal-distal distinction appears to be made in all languages and therefore promises to be an important window on the cognitive mechanisms underlying language production and comprehension. We address the problem of accounting for the distinction through a corpus-based quantitative study of the deictic use of demonstratives in Dutch. Our study suggests that the distal-proximal distinction corresponds with use of the proximal for intensive/strong indicating (i.e., directing of attention) and the distal for neutral indicating. We compare our findings with empirical findings on the use of English demonstratives and argue that, despite some apparent differences, Dutch and English demonstratives behave roughly similarly though not identically. Finally, we put our findings into context by pulling together evidence from a number of converging sources on the relationship between indicating and describing as alternative modes of reference in the use of distal and proximal demonstratives. This will also lead us to a new understanding of the folk-view on distals and proximals as distinguishing between nearby and faraway objects. Keywords: Proximal and distal demonstratives, accessibility, importance, deictic reference Biographical notes:Paul Piwek (1971) studied computational linguistics and the philosophy of linguistics and cognitive science at the Universities of Tilburg and Amsterdam, obtaining masters degrees in 1993 and 1994, both cum laude. He obtained his PhD from Eindhoven University in 1998, with a thesis on proof-theoretic natural language semantics and pragmatics. After working for some years as a postdoctoral researcher at the Information Technology Research Institute in Brighton, in 2005 he was appointed as a lecturer at the Open University in the UK. His current research interest is in verbal and non-verbal communication in dialogue.

show abstract

A Detailed Account of The First Question Generation Shared Task Evaluation Challenge

Rus

Wyse²,

Piwek³

et al. 2012

dad

View full text Add to dashboard Cite

The paper provides a detailed account of the First Shared Task Evaluation Challenge on Question Generation that took place in 2010. The campaign included two tasks that take text as input and produce text, i.e. questions, as output: Task A - “ Question Generation from Paragraphs and Task B - “ Question Generation from Sentences. Motivation, data sets, evaluation criteria, guidelines for judges, and results are presented for the two tasks. Lessons learned and advice for future Question Generation Shared Task Evaluation Challenges (QG-STEC) are also offered.

show abstract

Unpacking Tacit Knowledge for Requirements Engineering

Gervasi

Gacitúa

Rouncefield

et al. 2013

View full text Add to dashboard Cite

Agreement is overrated: A plea for correlation to assess human evaluation reliability

Amidei¹,

Piwek²,

Willis³

2019

View full text Add to dashboard Cite

Inter-Annotator Agreement (IAA) is used as a means of assessing the quality of NLG evaluation data, in particular, its reliability. According to existing scales of IAA interpretationsee, for example, Lommel et al. (2014), Liu et al. (2016), Sedoc et al. (2018) and Amidei et al. (2018a)-most data collected for NLG evaluation fail the reliability test. We confirmed this trend by analysing papers published over the last 10 years in NLG-specific conferences (in total 135 papers that included some sort of human evaluation study). Following Sampson and Babarczy (2008), Lommel et al. (2014), Joshi et al. (2016) and Amidei et al. (2018b), such phenomena can be explained in terms of irreducible human language variability. Using three case studies, we show the limits of considering IAA as the only criterion for checking evaluation reliability. Given human language variability, we propose that for human evaluation of NLG, correlation coefficients and agreement coefficients should be used together to obtain a better assessment of the evaluation data reliability. This is illustrated using the three case studies.

show abstract

T2D: Generating Dialogues Between Virtual Agents Automatically from Text

Piwek

Hernault

Prendinger

et al.

View full text Add to dashboard Cite

Abstract. The Text2Dialogue (T2D) system that we are developing allows digital content creators to generate attractive multi-modal dialogues presented by two virtual agents-by simply providing textual information as input. We use Rhetorical Structure Theory (RST) to decompose text into segments and to identify rhetorical discourse relations between them. These are then "acted out" by two 3D agents using synthetic speech and appropriate conversational gestures. In this paper, we present version 1.0 of the T2D system and focus on the novel technique that it uses for mapping rhetorical relations to question-answer pairs, thus transforming (monological) text into a form that supports dialogues between virtual agents.

show abstract

Evaluation methodologies in Automatic Question Generation 2013-2018

Amidei¹,

Piwek²,

Willis³

2018

View full text Add to dashboard Cite

In the last few years Automatic Question Generation (AQG) has attracted increasing interest. In this paper we survey the evaluation methodologies used in AQG. Based on a sample of 37 papers, our research shows that the systems' development has not been accompanied by similar developments in the methodologies used for the systems' evaluation. Indeed, in the papers we examine here, we find a wide variety of both intrinsic and extrinsic evaluation methodologies. Such diverse evaluation practices make it difficult to reliably compare the quality of different generation systems. Our study suggests that, given the rapidly increasing level of research in the area, a common framework is urgently needed to compare the performance of AQG systems and NLG systems more generally.

show abstract

Fully generated scripted dialogue for embodied agents

Deemter

Krenn

Piwek

et al. 2008

Artificial Intelligence

View full text Add to dashboard Cite

show abstract

The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations

Amidei¹,

Piwek²,

Willis³

2019

View full text Add to dashboard Cite

Rating and Likert scales are widely used in evaluation experiments to measure the quality of Natural Language Generation (NLG) systems. We review the use of rating and Likert scales for NLG evaluation tasks published in NLG specialized conferences over the last ten years (135 papers in total). Our analysis brings to light a number of deviations from good practice in their use. We conclude with some recommendations about the use of such scales. Our aim is to encourage the appropriate use of evaluation methodologies in the NLG community.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paul Piwek

‘Proximal’ and ‘distal’ in language and cognition: Evidence from deictic demonstratives in Dutch

A Detailed Account of The First Question Generation Shared Task Evaluation Challenge

Unpacking Tacit Knowledge for Requirements Engineering

Agreement is overrated: A plea for correlation to assess human evaluation reliability

T2D: Generating Dialogues Between Virtual Agents Automatically from Text

Evaluation methodologies in Automatic Question Generation 2013-2018

Fully generated scripted dialogue for embodied agents

The use of rating and Likert scales in Natural Language Generation human evaluation tasks: A review and some recommendations

Contact Info

Product

Resources

About