Is automatic speech-to-text transcription ready for use in psychological experiments?

Ziman, Kirsten; Heusser, Andrew C.; Fitzpatrick, Paxton C.; Field, Campbell E.; Manning, Jeremy R.

doi:10.3758/s13428-018-1037-4

Cited by 24 publications

(11 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Instead, we intend to provide a proof-of-concept that an ASR can be used to analyze certain aspects of spontaneous speech, allowing for large-scale use of natural speech for research ends. A similar approach has recently been taken by Ziman et al (2018), who showed that an ASR can be used reliably to transcribe speech data from psychological experiments, in their case a verbal recall memory test. In their study, Ziman and colleagues provided the speech context to their speech-to-text engine.…”

Section: Discussionmentioning

confidence: 96%

“…The strength of the correlations might be considered an index of how well a given measure, in the context of spontaneous speech elicitation, is suited to be transcribed by an ASR, or whether it may require manual coding. (For a similar correlational approach to evaluate transcription accuracy, see Ziman et al, 2018.) As we planned to carry out correlations for many measures of interest, we applied a Bonferroni correction (four measures and three questions resulted in a corrected alpha level of 0.05/12 = 0.004).…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

2020

View full text Add to dashboard Cite

Previous research has shown that vocabulary size affects performance on laboratory word production tasks. Individuals who know many words show faster lexical access and retrieve more words belonging to pre-specified categories than individuals who know fewer words. The present study examined the relationship between receptive vocabulary size and speaking skills as assessed in a natural sentence production task. We asked whether measures derived from spontaneous responses to everyday questions correlate with the size of participants’ vocabulary. Moreover, we assessed the suitability of automatic speech recognition (ASR) for the analysis of participants’ responses in complex language production data. We found that vocabulary size predicted indices of spontaneous speech: individuals with a larger vocabulary produced more words and had a higher speech-silence ratio compared to individuals with a smaller vocabulary. Importantly, these relationships were reliably identified using manual and automated transcription methods. Taken together, our results suggest that spontaneous speech elicitation is a useful method to investigate natural language production and that automatic speech recognition can alleviate the burden of labor-intensive speech transcription.

show abstract

Section: Discussionmentioning

confidence: 96%

Section: Methodsmentioning

confidence: 99%

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

2020

View full text Add to dashboard Cite

show abstract

“…The use of different NLP features, classifiers, and learning strategies discussed in this study seems promising to develop a system for the real-time detection of reminiscence in everyday conversations in German of older adults. Such a system could leverage audio-to-text software [ 78 ] of advanced methods from automated coding [ 24 ] to automate the transcription of conversations before NLP preprocessing and the computation of machine learning predictions.…”

Section: Discussionmentioning

confidence: 99%

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning

Ferrario¹,

Demiray²,

Yordanova³

et al. 2020

J Med Internet Res

View full text Add to dashboard Cite

Background Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations. Objective The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts. Methods The methods in this study comprise (1) collecting and coding of transcripts of older adults’ conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies. Results Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs. Conclusions This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults’ everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults’ well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

show abstract

“…Transcripts of the satirical news shows and the liberal and conservative news shows were collected by means of the command-line program youtube-dl (available at: http://ytdl-org.github.io/youtube-dl/), which we used to download automatic captions from YouTube. We used YouTube because previous research has found such automatic speech-to-text transcriptions to be accurate (Ziman et al 2018). In some respects, they may even be more reliable than the original US television subtitles because real-time subtitles can contain typos and are subject to strict character and time restrictions (Szarkowska, Cintas, and Gerber-Morón in press).…”

Section: Collection Of Transcriptsmentioning

confidence: 99%

From The Daily Show to Last Week Tonight: A Quantitative Analysis of Discursive Integration in Satirical Television News

et al. 2021

View full text Add to dashboard Cite

Satirical news shows constitute an innovative hybrid genre that mixes regular news and fiction. The discursive integration hypothesis posits that the defining characteristic of satirical news shows is that news and fiction elements are integrated such that boundaries between the preexisting genres have blurred. The current study quantitatively tests this hypothesis on both longrunning American shows such as The Daily Show and more recent shows such as Last Week Tonight. We collected transcripts of fifteen satirical news shows, eleven regular news shows, and fourteen fiction shows from 2018 (9,824,249 words). Transcripts were automatically tagged for over fifty linguistic features to identify register dimensions, patterns in linguistic features unique to genres, which we used to determine the presence of discursive integration. Findings revealed that two-thirds of satirical news shows were indeed characterized by discursive integration (which we labeled "complete hybrids"), while one-third manifested through the already existing hybrid genre of opinionated news (which we labeled "hybrid-genre echoes"). These two categories of shows demonstrate the importance of genre hybridity for defining satirical news across different shows.

show abstract

Is automatic speech-to-text transcription ready for use in psychological experiments?

Cited by 24 publications

References 30 publications

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

Vocabulary Size Influences Spontaneous Speech in Native Language Users: Validating the Use of Automatic Speech Recognition in Individual Differences Research

Social Reminiscence in Older Adults’ Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning

From The Daily Show to Last Week Tonight: A Quantitative Analysis of Discursive Integration in Satirical Television News

Contact Info

Product

Resources

About