2018
DOI: 10.5334/gjgl.396
|View full text |Cite
|
Sign up to set email alerts
|

Quantifying sentence acceptability measures: Reliability, bias, and variability

Abstract: Understanding and measuring sentence acceptability is of fundamental importance for linguists, but although many measures for doing so have been developed, relatively little is known about some of their psychometric properties. In this paper we evaluate within-and between-participant test-retest reliability on a wide range of measures of sentence acceptability. Doing so allows us to estimate how much of the variability within each measure is due to factors including participantlevel individual differences, sam… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

2
31
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
7
1

Relationship

0
8

Authors

Journals

citations
Cited by 28 publications
(34 citation statements)
references
References 64 publications
2
31
0
Order By: Relevance
“…This formal procedure will increase the sample size of participants and items, will better control for confounds, and avoids bias based on adherence to a particular linguistic theory. While the reliability of the informal procedure has been much debated (Gibson and Fedorenko, 2010;Sprouse and Almeida, 2012;Gibson et al, 2013a,b), it has been shown that acceptability judgments are generally reliable when formal data collection procedures are used that conform to the standards of experimental psychology (Langsford et al, 2018;Linzen and Oseki, 2018). Therefore, in this paper, we restrict our discussion to formal data collection procedures.…”
mentioning
confidence: 99%
See 1 more Smart Citation
“…This formal procedure will increase the sample size of participants and items, will better control for confounds, and avoids bias based on adherence to a particular linguistic theory. While the reliability of the informal procedure has been much debated (Gibson and Fedorenko, 2010;Sprouse and Almeida, 2012;Gibson et al, 2013a,b), it has been shown that acceptability judgments are generally reliable when formal data collection procedures are used that conform to the standards of experimental psychology (Langsford et al, 2018;Linzen and Oseki, 2018). Therefore, in this paper, we restrict our discussion to formal data collection procedures.…”
mentioning
confidence: 99%
“…Indeed, if we ask the same participant to judge different items in the same condition or if we ask different participants to judge the same item, we would not necessarily expect the same response from every participant on every item. If we look at the results from studies that test the reliability of acceptability judgments, we can see that there is indeed between-subject and between-item variability (e.g., Langsford et al, 2018).…”
mentioning
confidence: 99%
“…Gradient judgment tasks have been shown to be reliable indicators of acceptability in cases where there is no expected variation. Here, inter-speaker judgment reliability is high (Sprouse, Schütze & Almeida 2013;Sprouse & Almeida 2017;Langsford et al 2018, though see Linzen & Oseki 2018 on Hebrew and Japanese judgments). Gradient judgment tasks have also been shown to be reliable in cases of stable variation in a community, where speakers using both a dialect variant and a standard variant (Thoms 2014;Zanuttini et al 2018).…”
Section: Dialect Syntax and Acceptability Judgmentsmentioning
confidence: 90%
“…a binary scale, a Likert scale, and an open-ended scale constructed in Magnitude Estimation), to see whether the scales differ in perceived ease of use and expressivity, and in the judgment data they provide (e.g. Bader and Häussler 2010;Langsford et al 2018;Preston and Colman 2000).…”
Section: Introductionmentioning
confidence: 99%
“…Some studies (e.g. Langsford et al 2018) do examine test-retest reliability of judgments expressed on various scales, thus examining variation across time and across methods, but all analyses are performed on mean ratings. We will demonstrate how all four types of variation can be investigated in judgment data, and how they can be used as sources of information.…”
Section: Introductionmentioning
confidence: 99%