Improving speech playback using time-compression and speech recognition

Vemuri, Sunil; DeCamp, Philip; Bender, W.; Schmandt, Chris

doi:10.1145/985692.985730

Cited by 30 publications

(24 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In Vermuri et al [9], an audio playback interface was tested using recognition results with and without confidence visualization. No difference in users' comprehension rate was found.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

On the benefits of confidence visualization in speech recognition

Vertanen

Kristensson

2008

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

In a typical speech dictation interface, the recognizer's bestguess is displayed as normal, unannotated text. This ignores potentially useful information about the recognizer's confidence in its recognition hypothesis. Using a confidence measure (which itself may sometimes be inaccurate), we investigated providing visual feedback about low-confidence portions of the recognition using shaded, red underlining. An evaluation showed, compared to a baseline without underlining, underlining lowconfidence areas did not increase user's speed or accuracy in detecting errors. However, we found that when recognition errors were correctly underlined, they were discovered significantly more often than baseline. Conversely, when errors failed to be underlined, they were discovered less often. Our results indicate confidence visualization can be effective -but only if the confidence measure has high accuracy. Further, since our results show that users tend to trust confidence visualization, designers should be careful in its application if a high accuracy confidence measure is not available.

show abstract

“…In Vermuri et al [9], an audio playback interface was tested using recognition results with and without confidence visualization. No difference in users' comprehension rate was found.…”

Section: Related Workmentioning

confidence: 99%

“…[1,8,9]), in this paper, we focus on the first part of the correction problem only: finding errors. Detection of errors can be tricky for users as errors made by a recognizer are all valid words in a language.…”

Section: Introductionmentioning

confidence: 99%

On the benefits of confidence visualization in speech recognition

Vertanen

Kristensson

2008

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems

View full text Add to dashboard Cite

show abstract

“…In discussing recorded speech, Vemuri and colleagues discuss one reason why: aural speech delivery presents unique challenges [17]. The average speech rate of an English speaker is over twice as slow as the average reading rate.…”

Section: Introductionmentioning

confidence: 99%

“…This large disparity suggests that automatically transcribing audio and then accessing it as a written document would be most effective for information retrieval tasks. However, in reading a text transcript, the prosodic cues, which make speech rich in meaning and subtlety, are lost [17].…”

Section: Introductionmentioning

confidence: 99%

“…The value of improved navigation into linear media through text transcripts has been acknowledged for webcast lectures [15] and discussed in the context of experimentation with error-laden transcripts from automatic speech recognition (ASR) [17]. The inherent value of a searchable transcript for navigating into linear audio (or the narrative audio track of linear video) can be seen with recent efforts by major Internet corporations such as Google and Microsoft to search within video as opposed to only searching to a video.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Enhanced exploration of oral history archives through processed video and synchronized text transcripts

Christel

Stevens

Maher

et al. 2010

Proceedings of the 18th ACM International Conference on Multimedia

View full text Add to dashboard Cite

A digital video library of over 900 hours of video and 18000 stories from The HistoryMakers was used by 266 students, faculty, librarians, and life-long learners interacting with a system providing multiple search and viewing capabilities over a trial period of several months. User demographics and actions were logged with this multimedia collection, providing quantitative and qualitative metrics on system use. These transaction logs were complemented with heuristic evaluation, interviews, and contextual inquiry with representative users. Collectively, these mixed methods informed the development of the next generation web-based interface for the HistoryMakers video oral histories to improve access to and dissemination of this rich cultural resource. In particular, the feature of a synchronized text transcript in the video player for the narratives merited further investigation. Such an interface has not seen widespread use in digital video players available on the web, yet was valued highly by oral history archive viewers. A user study with 27 participants measured the utility of the HistoryMakers web interface incorporating the synchronized transcript video player for stated fact-finding and open-ended tasks. For life oral histories, an aligned text transcript is valued for both tasks, with the video rated significantly more useful for open-ended tasks over fact-finding. These results suggest a task-dependent role of modality in presentation of oral histories, with synchronized transcripts rated highly across tasks.

show abstract

Learning in double time: The effect of lecture video speed on immediate and delayed comprehension

Murphy

Hoover

Agadzhanyan

et al. 2021

Applied Cognitive Psychology

View full text Add to dashboard Cite

We presented participants with lecture videos at different speeds and tested immediate and delayed (1 week) comprehension. Results revealed minimal costs incurred by increasing video speed from 1x to 1.5x, or 2x speed, but performance declined beyond 2x speed. We also compared learning outcomes after watching videos once at 1x or twice at 2x speed. There was not an advantage to watching twice at 2x speed but if participants watched the video again at 2x speed immediately before the test, compared with watching once at 1x a week before the test, comprehension improved. Thus, increasing the speed of videos (up to 2x) may be an efficient strategy, especially if students use the time saved for additional studying or rewatching the videos, but learners should do this additional studying shortly before an exam.However, these trends may differ for videos with different speech rates, complexity or difficulty, and audiovisual overlap.

show abstract

Improving speech playback using time-compression and speech recognition

Cited by 30 publications

References 12 publications

On the benefits of confidence visualization in speech recognition

On the benefits of confidence visualization in speech recognition

Enhanced exploration of oral history archives through processed video and synchronized text transcripts

Learning in double time: The effect of lecture video speed on immediate and delayed comprehension

Contact Info

Product

Resources

About