Stefan Petrik scite author profile

In this paper, we describe the automatic reconstruction of literal transcriptions for medical dictations from a non-literal transcription and an automatically recognized speech transcript by phonetic similarity matching and alignment. We present a customized phonetic similarity measure which is trained on a set of phonetically similar string pairs, returns interpretable alignment results, and is robust in its application. Furthermore, we introduce exible automatic phonetic transcription with regular expressions to deal with formatted entities in written texts and alternative pronunciations in recognized texts. In an evaluation, our method reduced the word error rate for the reconstructed transcription by 12% relative.

show abstract

Semantic and phonetic automatic reconstruction of medical dictations

Petrik

Drexel

Fessler

et al. 2011

Computer Speech & Language

View full text Add to dashboard Cite

Automatic speech recognition (ASR) has become a valuable tool in large document production environments like medical dictation. While manual post-processing is still needed for correcting speech recognition errors and for creating documents which adhere to various stylistic and formatting conventions, a large part of the document production process is carried out by the ASR system. For improving the quality of the system output, knowledge about the multi-layered relationship between the dictated texts and the final documents is required. Thus, typical speechrecognition errors can be avoided, and proper style and formatting can be anticipated in the ASR part of the document production process. Yet -while vast amounts of recognition results and manually edited final reports are constantly being produced -the error-free literal transcripts of the actually dictated texts are a scarce and costly resource because they have to be created by manually transcribing the audio files.To obtain large corpora of literal transcripts for medical dictation, we propose a method for automatically reconstructing them from draft speech-recognition transcripts plus the corresponding final medical reports. The main innovative aspect of our method is the combination of two independent knowledge sources: phonetic information for the identification of speech-recognition errors and semantic information for detecting post-editing concerning format and style. Speech recognition results and final reports are first aligned, then properly matched based on semantic and phonetic similarity, and finally categorised and selectively combined into a reconstruction hypothesis. This method can be used for various applications in language technology, e.g., adaptation for ASR, document production, or generally for the development of parallel text corpora of non-literal text resources. In an Preprint submitted to Elsevier 5 July 2010 *ManuscriptPage 2 of 38 A c c e p t e d M a n u s c r i p t 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 experimental evaluation, which also includes an assessment of the quality of the reconstructed transcripts compared to manual transcriptions, the described method results in a relative word error rate reduction of 7.74% after retraining the standard language model with reconstructed transcripts.

show abstract

Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios

Pessentheiner¹,

Petrik²,

Romsdorfer³

2012

View full text Add to dashboard Cite

Automatic phonetics-driven reconstruction of medical dictations on multiple levels of segmentation

Petrik

Pernkopf

2008

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Stefan Petrik

A pitch tracking corpus with evaluation on multipitch tracking scenario

Reconstructing Medical Dictations from Automatically Recognized and Non-Literal Transcripts with Phonetic Similarity Matching

Semantic and phonetic automatic reconstruction of medical dictations

Beamforming using uniform circular arrays for distant speech recognition in reverberant environments and double talk scenarios

Automatic phonetics-driven reconstruction of medical dictations on multiple levels of segmentation

Contact Info

Product

Resources

About