Probabilistic Kernels for Improved Text-to-Speech Alignment in Long Audio Tracks

Bordel, Germán; Peñagarikano, Mikel; Rodríguez-Fuentes, Luis Javier; Álvarez, Aitor; Varona, Amparo

doi:10.1109/lsp.2015.2505140

Cited by 12 publications

(11 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In such cases, a more sophisticated analysis is required. To this purpose, in [16], the authors propose using probabilistic kernels (similarity functions) about the speaker behavior in order to improve the text alignment. Recently, in [17] in order to handle the variability of amateur reading and improve the performance of text alignment systems, the authors introduce a human in the loop approach.…”

Section: State Of the Art Reviewmentioning

confidence: 99%

Automatic Subtitle Synchronization and Positioning System Dedicated to Deaf and Hearing Impaired People

Mocanu

Țapu²

2021

IEEE Access

View full text Add to dashboard Cite

In this paper, we introduce a subtitle synchronization and positioning system designed to increase the accessibility of deaf and hearing impaired people to multimedia documents. The main contributions of the paper concern: a novel synchronization algorithm able to robustly align, without any human intervention, the closed caption with the audio transcript and a timestamp refinement technique that adjusts the subtitle segments duration with respect to the audiovisual recommendations. Finally, we introduce a novel method that performs a high level understanding of the multimedia content, in order to determine the subtitle optimal positions, within the video frame, such that they do not overlap with other relevant textual information. The experimental evaluation performed on a large dataset of 30 videos taken from the French national television validates the approach with average accuracy scores superior to 90% regardless on the video genre. The subjective evaluation of the proposed subtitle synchronization and positioning system, performed with actual hearing impaired people, demonstrates the effectiveness of our approach.

show abstract

Section: State Of the Art Reviewmentioning

confidence: 99%

Automatic Subtitle Synchronization and Positioning System Dedicated to Deaf and Hearing Impaired People

Mocanu

Țapu²

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Text-to-speech alignment faces two challenges: very long audio signals and corrupted speech. While some approaches cope with the former [7,8], the latter is far from being solved for low SNRs. The method based on probabilistic kernels in [8] can align text with long audio signals but performance decreases when the speech is mixed with music.…”

Section: Related Workmentioning

confidence: 99%

“…While some approaches cope with the former [7,8], the latter is far from being solved for low SNRs. The method based on probabilistic kernels in [8] can align text with long audio signals but performance decreases when the speech is mixed with music. The approach in [16] applies ASR on a long speech signal and aligns a given text transcript with the recognized text.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech

Schulze-Forster

Doire

Richard

et al. 2020

ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

HAL is a multidisciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L'archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d'enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

show abstract

“…In some uses such as bioinformatics, in which the length of the sequences is extremely long, memory consumption is prohibitive, and therefore, optimizations have been proposed such as the Hirschberg algorithm [36] which is able to reduce the space up to O (n + m) but, at the expense of a computation time increment. Other proposals include the Levenshtein distance for synchronizing the videos, minutes and text transcripts, of the Basque Parliament plenary sessions [37], for aligning text with speech audio signals with lengths of up to several hours [38], for automatic bilingual subtitle generation for lecture videos [20] or even for automatic face annotation in TV series by video/script and subtitles alignment [39].…”

Section: Literature Reviewmentioning

confidence: 99%

Sub-Sync: Automatic Synchronization of Subtitles in the Broadcasting of True Live programs in Spanish

González‐Carrasco

Puente²,

Ruíz‐Mezcua

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Individuals with sensory impairment (hearing or visual) encounter serious communication barriers within society and the world around them. These barriers hinder the communication process and make access to information an obstacle they must overcome on a daily basis. In this context, one of the most common complaints made by the Television (TV) users with sensory impairment is the lack of synchronism between audio and subtitles in some types of programs. In addition, synchronization remains one of the most significant factors in audience perception of quality in live-originated TV subtitles for the deaf and hard of hearing. This paper introduces the Sub-Sync framework intended for use in automatic synchronization of audiovisual contents and subtitles, taking advantage of current well-known techniques used in symbol sequences alignment. In this particular case, these symbol sequences are the subtitles produced by the broadcaster subtitling system and the word flow generated by an automatic speech recognizing the procedure. The goal of Sub-Sync is to address the lack of synchronism that occurs in the subtitles when produced during the broadcast of live TV programs or other programs that have some improvised parts. Furthermore, it also aims to resolve the problematic interphase of synchronized and unsynchronized parts of mixed type programs. In addition, the framework is able to synchronize the subtitles even when they do not correspond literally to the original audio and/or the audio cannot be completely transcribed by an automatic process. Sub-Sync has been successfully tested in different live broadcasts, including mixed programs, in which the synchronized parts (recorded, scripted) are interspersed with desynchronized (improvised) ones. INDEX TERMS Accessibility, TV broadcasting, algorithm design and analysis, automatic speech recognition.

show abstract

Probabilistic Kernels for Improved Text-to-Speech Alignment in Long Audio Tracks

Cited by 12 publications

References 9 publications

Automatic Subtitle Synchronization and Positioning System Dedicated to Deaf and Hearing Impaired People

Automatic Subtitle Synchronization and Positioning System Dedicated to Deaf and Hearing Impaired People

Joint Phoneme Alignment and Text-Informed Speech Separation on Highly Corrupted Speech

Sub-Sync: Automatic Synchronization of Subtitles in the Broadcasting of True Live programs in Spanish

Contact Info

Product

Resources

About