ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Ogawa, Akihiro; Hori, Takaaki

doi:10.1109/icassp.2015.7178796

Cited by 23 publications

(22 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Based on the distribution of short edit and capitalization subcategories, we estimate that roughly 80% are amenable to automatic detection algorithms that could be used in an enhanced editing tool to alert physicians to spans of text to check, optionally with proposed corrections. Advances in NLP algorithms for ASR error detection, [30][31][32] disfluency detection, 33 sentence segmentation, 34 true casing, 35 and entity recognition 36 are relevant here. Such algorithms also benefit from incorporating additional resources, such as patient data within the EHR and biomedical knowledge sources, as shown for edit detection.…”

Section: Discussionmentioning

confidence: 99%

Asynchronous Speech Recognition Affects Physician Editing of Notes

et al. 2018

View full text Add to dashboard Cite

Clinical documentation is a critical component of patient care, and communicating accurately and comprehensively through clinical notes is important to achieving positive health outcomes. Creating notes within electronic health record (EHR) systems is time-consuming, affects documentation accuracy, negatively affects the career satisfaction of clinicians, and causes lost labor productivity. 1-5 Dictation using transcriptionists and automatic speech recognition (ASR) has the potential to improve Keywords ► electronic health records and systems ► clinical documentation and communications ► natural language processing ► notes ► workflow AbstractObjective Clinician progress notes are an important record for care and communication, but there is a perception that electronic notes take too long to write and may not accurately reflect the patient encounter, threatening quality of care. Automatic speech recognition (ASR) has the potential to improve clinical documentation process; however, ASR inaccuracy and editing time are barriers to wider use. We hypothesized that automatic text processing technologies could decrease editing time and improve note quality. To inform the development of these technologies, we studied how physicians create clinical notes using ASR and analyzed note content that is revised or added during asynchronous editing. Materials and MethodsWe analyzed a corpus of 649 dictated clinical notes from 9 physicians. Notes were dictated during rounds to portable devices, automatically transcribed, and edited later at the physician's convenience. Comparing ASR transcripts and the final edited notes, we identified the word sequences edited by physicians and categorized the edits by length and content. Results We found that 40% of the words in the final notes were added by physicians while editing: 6% corresponded to short edits associated with error correction and format changes, and 34% were associated with longer edits. Short error correction edits that affect note accuracy are estimated to be less than 3% of the words in the dictated notes. Longer edits primarily involved insertion of material associated with clinical data or assessment and plans. The longer edits improve note completeness; some could be handled with verbalized commands in dictation. Conclusion Process interventions to reduce ASR documentation burden, whether related to technology or the dictation/editing workflow, should apply a portfolio of solutions to address all categories of required edits. Improved processes could reduce an important barrier to broader use of ASR by clinicians and improve note quality. CME/MOC-II* Ã To earn credit, visit AMIA for details.

show abstract

Section: Discussionmentioning

confidence: 99%

Asynchronous Speech Recognition Affects Physician Editing of Notes

et al. 2018

View full text Add to dashboard Cite

show abstract

“…Since some neural architectures showed recently to be effective to process sequence to sequence tasks [46], it could be interesting to compare the neural approach used until now in our experiments to measure the impact of continuous representations to the use of a bidirectional LSTM architecture. Such an architecture is designed to learn how to integrate relevant long distant information, and was successfully used for the ASR error detection task in [6,7]. In our experiments, the bidirectional LSTM architecture is composed of two hidden layers of 512 hidden units each, i.e.…”

Section: Comparison To Bidirectional Lstm Systemmentioning

confidence: 99%

“…In [5], authors propose to use a neural network classifier furnished by stacked auto-encoders (SAE), that helps to learn the error word representations. In [6,7], the authors investigated three types of ASR error detection tasks, e.g. confidence estimation, out-of-vocabulary word detection and error type classification (insertion, substitution or deletion), based on deep bidirectional recurrent neural networks.…”

Section: Introductionmentioning

confidence: 99%

A study of continuous space word and sentence representations applied to ASR error detection

Ghannay

Estève

Camelin

2020

Speech Communication

View full text Add to dashboard Cite

This paper presents a study of continuous word representations applied to automatic detection of speech recognition errors. A neural network architecture is proposed, which is well suited to handle continuous word representations, like word embeddings. We explore the use of several types of word representations: simple and combined linguistic embeddings, and acoustic ones associated to prosodic features, extracted from the audio signal. To compensate certain phenomena highlighted by the analysis of the error average span, we propose to model the errors at the sentence level through the use of sentence embeddings. An approach to build continuous sentence representations dedicated to ASR error detection is also proposed and compared to the Doc2vec approach. Experiments are performed on automatic transcriptions generated by the LIUM ASR system applied to the French ETAPE corpus. They show that the combination of linguistic embeddings, acoustic embeddings, prosodic features, and sentence embeddings in addition to more classical features yields very competitive results. Particularly, these results show the complementarity of acoustic embeddings and prosodic information, and show that the proposed sentence embeddings dedicated to ASR error detection achieve better results than generic sentence embeddings.

show abstract

“…The latter two methods show slightly superior performance but higher computational complexity compared to the first one. More recently [4], new features and bidirectional recurrent neural networks (RNN) have been proposed for ASR error detection. Most SLU systems reviewed in [5] generate hypotheses of semantic frame slot tags expressed in a spoken sentence analyzed by an ASR system.…”

Section: Related Workmentioning

confidence: 99%

ASR Error Management for Improving Spoken Language Understanding

Simonnet¹,

Ghannay²,

Camelin³

et al. 2017

Interspeech 2017

View full text Add to dashboard Cite

This paper addresses the problem of automatic speech recognition (ASR) error detection and their use for improving spoken language understanding (SLU) systems. In this study, the SLU task consists in automatically extracting, from ASR transcriptions, semantic concepts and concept/values pairs in a e.g touristic information system. An approach is proposed for enriching the set of semantic labels with error specific labels and by using a recently proposed neural approach based on word embeddings to compute well calibrated ASR confidence measures. Experimental results are reported showing that it is possible to decrease significantly the Concept/Value Error Rate with a state of the art system, outperforming previously published results performance on the same experimental data. It also shown that combining an SLU approach based on conditional random fields with a neural encoder/decoder attention based architecture, it is possible to effectively identifying confidence islands and uncertain semantic output segments useful for deciding appropriate error handling actions by the dialogue manager strategy.

show abstract

ASR error detection and recognition rate estimation using deep bidirectional recurrent neural networks

Cited by 23 publications

References 26 publications

Asynchronous Speech Recognition Affects Physician Editing of Notes

Asynchronous Speech Recognition Affects Physician Editing of Notes

A study of continuous space word and sentence representations applied to ASR error detection

ASR Error Management for Improving Spoken Language Understanding

Contact Info

Product

Resources

About