2021
DOI: 10.1093/jssam/smab022
|View full text |Cite
|
Sign up to set email alerts
|

A Model-Assisted Approach for Finding Coding Errors in Manual Coding of Open-Ended Questions

Abstract: Text answers to open-ended questions are typically manually coded into one of several codes. Usually, a random subset of text answers is double-coded to assess intercoder reliability, but most of the data remain single-coded. Any disagreement between the two coders points to an error by one of the coders. When the budget allows double coding additional text answers, we propose employing statistical learning models to predict which single-coded answers have a high risk of a coding error. Specifically, we train … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 8 publications
0
4
0
Order By: Relevance
“…c) The multi-label algorithms we have shown use SVMs as the base learner. However, we know that other algorithms such as gradient boosting and random forest perform similarly when classifying answers to open-ended questions (He and Schonlau, 2022;Gweon and Schonlau, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…c) The multi-label algorithms we have shown use SVMs as the base learner. However, we know that other algorithms such as gradient boosting and random forest perform similarly when classifying answers to open-ended questions (He and Schonlau, 2022;Gweon and Schonlau, 2023).…”
Section: Discussionmentioning
confidence: 99%
“…Many statistical learning algorithms are now available in statistical software like R and Python, and it is not possible to give a complete overview here (see e.g., Hao and Ho, 2019 , for a Python overview). However, we do want to point to some of the most popular choices that have been applied to classifying answers to open-ended questions: these include tree-based methods like random forests and boosting (Schonlau and Couper, 2016 ; Kern et al, 2019 ; Schierholz and Schonlau, 2021 ), support vector machines (SVM) (Joachims, 2001 ; Bullington et al, 2007 ; He and Schonlau, 2020 , 2021 ; Khanday et al, 2021 ), multinomial regression (Schierholz and Schonlau, 2021 ) and naïve Bayes classifiers (Severin et al, 2017 ; Paudel et al, 2018 ).…”
Section: Survey Motivation In the Gesis Panelmentioning
confidence: 99%
“…A common practice in analyzing qualitative data is to develop a coding scheme or framework to analyze data, train research assistants (RAs) to apply the framework, ensure sufficient inter-rater reliability, and then have RAs analyze the data [1][2][3]. Some researchers also discuss the use of machine learning or artificial intelligence to help throughout the qualitative data analysis process as another pathway to analyzing data [4][5][6][7][8][9]. This paper explores the idea of developing and using a computer program to assist in coding open-ended survey responses.…”
Section: Introductionmentioning
confidence: 99%
“…However, they did find correlations between the auto-and human-coded results suggesting that they both found the same responses easy or hard to code. The same authors have also attempted semi-automated coding methods to improve accuracy using machine learning to identify and code easy responses, leaving the more difficult ones to be manually coded [11] or to identify responses in a dataset with a high probability of error for further analysis via double-coding [8].…”
Section: Introductionmentioning
confidence: 99%