Informatized Caption Enhancement Based on IBM Watson API and Speaker Pronunciation

Choi, Yong-Sik; Son, Yunsik; Jung, Jin‐Woo

doi:10.5121/csit.2018.80211

Cited by 3 publications

(4 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…One of the most popular speech recognition technologies is the IBM Watson API [9]. Among captions in which speech is converted into characters, captions including timing and speaker ID information are called as Informatized Caption [3][4][5]. And, this Informatized Caption can be generated using the IBM Watson API [10].…”

Section: Introductionmentioning

confidence: 99%

“…To solve this problem, many researches have been done. However, in the previous researches [3][4][5], there is still some problem that the speaker cannot be distinguished well when there is a plurality of speakers who can speak the same word differently [3], and an assumption that the database with speaker pronunciation time information should be ready in advance [4,5]. In this paper, a method of modifying incorrectly recognized word using original caption is proposed to enhance the timing performance while updating the database in real time using the Informatized Caption information.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Real-time Informatized caption enhancement based on speaker pronunciation time database

Choi

Kang

Joo

et al. 2020

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

IBM Watson is one of the representative tools for speech recognition system which can automatically generate not only speech-to-text information but also speaker ID and timing information, which is called as Informatized Caption. However, if there is some noise in the voice signal to the IBM Watson API, the recognition performance is significantly decreased. It can be easily found in movies with background music and special sound effects. This paper aims to improve the inaccuracy problem of current Informatized Captions in noisy environments. In this paper, a method of modifying incorrectly recognized words and a method of enhancing timing accuracy while updating database in real time are suggested based on the original caption and Informatized Caption information. Experimental results shows that the proposed method can give 81.09% timing accuracy for the case of 10 representative animation, horror and action movies.

show abstract

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Real-time Informatized caption enhancement based on speaker pronunciation time database

Choi

Kang

Joo

et al. 2020

Multimed Tools Appl

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, the IBM Watson API is more susceptible to clipping errors due to poor recognition results when there aresome noises in the voice signal. And this situation is easily found with movie sounds that include not only speaking voice signal but also include background music or special sound effects.In order to solve this noisy voice problem, there has been a method of predicting the timing information of informatized caption based on a linear estimation formula proportional to the number of alphabets used in each word [2]. But, this linear estimation method based on the number of alphabets is not good enough when there are some silent syllables.…”

Section: Introductionmentioning

confidence: 99%

“…One of them is natural language processing by speech recognition. Typical speech recognition technologies include speech to text conversion.Among captions in which speech is converted into characters, captions including timing information and speaker ID information are referred to as informatizedcaptions [1,2]. Such an informatized caption could be generated by using IBM Watson API [3].…”

Section: Introductionmentioning

confidence: 99%

A Method to Enhance Informatized Caption from IBM Watson API Using Speaker Pronunciation Time-DB

Choi¹,

Son²,

Jung³

2018

IJNLC

Self Cite

View full text Add to dashboard Cite

show abstract

A Method to Real-Time Update Speaker Pronunciation Time-Database for the Application of Informatized Caption Enhancement by IBM Watson API

Choi

Kim

Yang

et al. 2019

Lecture Notes in Electrical Engineering

View full text Add to dashboard Cite

Informatized Caption Enhancement Based on IBM Watson API and Speaker Pronunciation

Abstract: ABSTRACT

Cited by 3 publications

References 0 publications

Real-time Informatized caption enhancement based on speaker pronunciation time database

Real-time Informatized caption enhancement based on speaker pronunciation time database

A Method to Enhance Informatized Caption from IBM Watson API Using Speaker Pronunciation Time-DB

A Method to Real-Time Update Speaker Pronunciation Time-Database for the Application of Informatized Caption Enhancement by IBM Watson API

Contact Info

Product

Resources

About