Automated semantic tagging of speech audio

Raimond, Yves; Lowis, Chris; Hodgson, R.; Tweed, Jonathan

doi:10.1145/2187980.2188060

Cited by 7 publications

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…General broadcast data is recorded in diverse environments, includes dramas with highly-emotional speech, and often has overlaid background music or sound effects: word error rates (WERs) on such data are several times higher than for broadcast news and very variable across different genres. Work in this area has included automatic transcription of podcasts and other web audio [1], automatic transcription of Youtube [2,3], the MediaEval speech retrieval evaluation which used blip.tv semi-professional user created content [4], the automatic tagging of a large radio archive [5], and automatic transcription of multi-genre media archive data [6]. Recently, systems were developed for the 2015 Multi-Genre Broadcast (MGB) challenge [7][8][9][10].…”

Section: Introductionmentioning

confidence: 99%

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems

et al. 2016

View full text Add to dashboard Cite

This paper compares schemes for the selection of multi-genre broadcast data and corresponding transcriptions for speech recognition model training. Selections of the same amount of data (700 hours) from lightly supervised alignments based on the same original subtitle transcripts are compared. Data segments were selected according to a maximum phone matched error rate between the lightly supervised decoding and the original transcript. The data selected with an improved lightly supervised system yields lower word error rates (WERs). Detailed comparisons of the data selected on carefully transcribed development data show how the selected portions match the true phone error rate for each genre. From a broader perspective, it is shown that for different genres, either the original subtitles or the lightly supervised output should be used for model training and a suitable combination yields further reductions in final WER.

show abstract

Section: Introductionmentioning

confidence: 99%

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems

et al. 2016

View full text Add to dashboard Cite

show abstract

“…Recent work which has focused on the automatic transcription or indexing of multi-genre broadcast data has included work on the automatic transcription of podcasts and other web audio [1], automatic transcription of YouTube [2,3], the MediaEval rich speech retrieval evaluation which used blip.tv semi-professional user created content [4], and the automatic tagging of a large radio archive [5]. This paper concerns the automatic transcription of multigenre content from the BBC archive.…”

Section: Introductionmentioning

confidence: 99%

Transcription of multi-genre media archives using out-of-domain data

Bell

Gales

Lanchantin

et al. 2012

2012 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

We describe our work on developing a speech recognition system for multi-genre media archives. The high diversity of the data makes this a challenging recognition task, which may benefit from systems trained on a combination of in-domain and out-of-domain data. Working with tandem HMMs, we present Multi-level Adaptive Networks (MLAN), a novel technique for incorporating information from out-of-domain posterior features using deep neural networks. We show that it provides a substantial reduction in WER over other systems, with relative WER reductions of 15% over a PLP baseline, 9% over in-domain tandem features and 8% over the best out-of-domain tandem features.

show abstract

Using the Past to Explain the Present: Interlinking Current Affairs with Archives via the Semantic Web

Raimond¹,

Smethurst²,

McParland³

et al. 2013

Advanced Information Systems Engineering

Self Cite

View full text Add to dashboard Cite

The BBC has a very large archive of programmes, covering a wide range of topics. This archive holds a significant part of the BBC's institutional memory and is an important part of the cultural history of the United Kingdom and the rest of the world. These programmes, or parts of them, can help provide valuable context and background for current news events. However the BBC's archive catalogue is not a complete record of everything that was ever broadcast. For example, it excludes the BBC World Service, which has been broadcasting since 1932. This makes the discovery of content within these parts of the archive very difficult. In this paper we describe a system based on Semantic Web technologies which helps us to quickly locate content related to current news events within those parts of the BBC's archive with little or no pre-existing metadata. This system is driven by automated interlinking of archive content with the Semantic Web, user validations of the resulting data and topic extraction from live BBC News subtitles. The resulting interlinks between live news subtitles and the BBC's archive are used in a dynamic visualisation enabling users to quickly locate relevant content. This content can then be used by journalists and editors to provide historical context, background information and supporting content around current affairs.

show abstract

Automated semantic tagging of speech audio

Cited by 7 publications

References 8 publications

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems

Selection of Multi-Genre Broadcast Data for the Training of Automatic Speech Recognition Systems

Transcription of multi-genre media archives using out-of-domain data

Using the Past to Explain the Present: Interlinking Current Affairs with Archives via the Semantic Web

Contact Info

Product

Resources

About