2016
DOI: 10.1109/lsp.2015.2505140
|View full text |Cite
|
Sign up to set email alerts
|

Probabilistic Kernels for Improved Text-to-Speech Alignment in Long Audio Tracks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
3
2

Relationship

0
5

Authors

Journals

citations
Cited by 12 publications
(11 citation statements)
references
References 9 publications
0
11
0
Order By: Relevance
“…In such cases, a more sophisticated analysis is required. To this purpose, in [16], the authors propose using probabilistic kernels (similarity functions) about the speaker behavior in order to improve the text alignment. Recently, in [17] in order to handle the variability of amateur reading and improve the performance of text alignment systems, the authors introduce a human in the loop approach.…”
Section: State Of the Art Reviewmentioning
confidence: 99%
“…In such cases, a more sophisticated analysis is required. To this purpose, in [16], the authors propose using probabilistic kernels (similarity functions) about the speaker behavior in order to improve the text alignment. Recently, in [17] in order to handle the variability of amateur reading and improve the performance of text alignment systems, the authors introduce a human in the loop approach.…”
Section: State Of the Art Reviewmentioning
confidence: 99%
“…Text-to-speech alignment faces two challenges: very long audio signals and corrupted speech. While some approaches cope with the former [7,8], the latter is far from being solved for low SNRs. The method based on probabilistic kernels in [8] can align text with long audio signals but performance decreases when the speech is mixed with music.…”
Section: Related Workmentioning
confidence: 99%
“…While some approaches cope with the former [7,8], the latter is far from being solved for low SNRs. The method based on probabilistic kernels in [8] can align text with long audio signals but performance decreases when the speech is mixed with music. The approach in [16] applies ASR on a long speech signal and aligns a given text transcript with the recognized text.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In some uses such as bioinformatics, in which the length of the sequences is extremely long, memory consumption is prohibitive, and therefore, optimizations have been proposed such as the Hirschberg algorithm [36] which is able to reduce the space up to O (n + m) but, at the expense of a computation time increment. Other proposals include the Levenshtein distance for synchronizing the videos, minutes and text transcripts, of the Basque Parliament plenary sessions [37], for aligning text with speech audio signals with lengths of up to several hours [38], for automatic bilingual subtitle generation for lecture videos [20] or even for automatic face annotation in TV series by video/script and subtitles alignment [39].…”
Section: Literature Reviewmentioning
confidence: 99%