2004
DOI: 10.1023/b:ijst.0000017020.53797.a0
|View full text |Cite
|
Sign up to set email alerts
|

Multi-Scale Spoken Document Retrieval for Cantonese Broadcast News

Abstract: This paper presents the application of a multi-scale paradigm to Chinese spoken document retrieval (SDR) for improving retrieval performance. Multi-scale refers to the use of both words and subwords for retrieval. Words are basic units in a language that carry lexical meaning and subword units (such as phonemes, syllables or characters) are building components for words. Retrieval using subword indexing units is found to perform better than words because of the robustness of subword units to out-of-vocabulary … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2004
2004
2015
2015

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(3 citation statements)
references
References 36 publications
0
3
0
Order By: Relevance
“…phonemes, syllables and sub-phonetic segments) have shown robustness to speech recognition errors and OOV words in spoken document retrieval (SDR) [23] tasks. Especially for Chinese, retrieval based on character or syllable indexing is superior to words due to the special features of Chinese [5,22]. We believe that subwords should also be effective in story segmentation of erroneous broadcast news transcripts through partial matching.…”
Section: Related Workmentioning
confidence: 99%
“…phonemes, syllables and sub-phonetic segments) have shown robustness to speech recognition errors and OOV words in spoken document retrieval (SDR) [23] tasks. Especially for Chinese, retrieval based on character or syllable indexing is superior to words due to the special features of Chinese [5,22]. We believe that subwords should also be effective in story segmentation of erroneous broadcast news transcripts through partial matching.…”
Section: Related Workmentioning
confidence: 99%
“…In order to process Turkish language documents recognition and indexing units are used as sub-words units by Parlak et al to reduce both the OOV rate and the index alternative recognition hypothesis to handle ASR errors [26]. Some researcher such as Lo et al [4] concentrated on the application of a multi-scale paradigm for Chinese SDR to simply improve retrieval performance. BASRAH [18] system has been designed to detect story boundaries in multilingual (English and Malay) using Confidence Measures (CMs) of the ASR.…”
Section: Related Workmentioning
confidence: 99%
“…A spoken document retrieval (SDR) system uses automatic speech recognition and information retrieval technologies to analyze and process multimedia documents [1]- [4]. Automatic speech recognition (ASR) systems are used to convert spoken documents (speech) into text transcription.…”
Section: Introductionmentioning
confidence: 99%