Computer Science &Amp; Information Technology 2018
DOI: 10.5121/csit.2018.80211
|View full text |Cite
|
Sign up to set email alerts
|

Informatized Caption Enhancement Based on IBM Watson API and Speaker Pronunciation

Abstract: ABSTRACT

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
4
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
2
1

Relationship

2
1

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…One of the most popular speech recognition technologies is the IBM Watson API [9]. Among captions in which speech is converted into characters, captions including timing and speaker ID information are called as Informatized Caption [3][4][5]. And, this Informatized Caption can be generated using the IBM Watson API [10].…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…One of the most popular speech recognition technologies is the IBM Watson API [9]. Among captions in which speech is converted into characters, captions including timing and speaker ID information are called as Informatized Caption [3][4][5]. And, this Informatized Caption can be generated using the IBM Watson API [10].…”
Section: Introductionmentioning
confidence: 99%
“…To solve this problem, many researches have been done. However, in the previous researches [3][4][5], there is still some problem that the speaker cannot be distinguished well when there is a plurality of speakers who can speak the same word differently [3], and an assumption that the database with speaker pronunciation time information should be ready in advance [4,5]. In this paper, a method of modifying incorrectly recognized word using original caption is proposed to enhance the timing performance while updating the database in real time using the Informatized Caption information.…”
Section: Introductionmentioning
confidence: 99%
“…However, the IBM Watson API is more susceptible to clipping errors due to poor recognition results when there aresome noises in the voice signal. And this situation is easily found with movie sounds that include not only speaking voice signal but also include background music or special sound effects.In order to solve this noisy voice problem, there has been a method of predicting the timing information of informatized caption based on a linear estimation formula proportional to the number of alphabets used in each word [2]. But, this linear estimation method based on the number of alphabets is not good enough when there are some silent syllables.…”
Section: Introductionmentioning
confidence: 99%
“…One of them is natural language processing by speech recognition. Typical speech recognition technologies include speech to text conversion.Among captions in which speech is converted into characters, captions including timing information and speaker ID information are referred to as informatizedcaptions [1,2]. Such an informatized caption could be generated by using IBM Watson API [3].…”
Section: Introductionmentioning
confidence: 99%