Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 1998
DOI: 10.1145/290941.290950
|View full text |Cite
|
Sign up to set email alerts
|

New techniques for open-vocabulary spoken document retrieval

Abstract: This paper presents four novel techniques for open-vocabulary spoken document retrieval: a method to detect slots that possibly contain a query feature; a method to estimate occurrence probabilities; a technique that we call collection-wide probability re-estimation and a w eighting scheme which takes advantage of the fact that long query features are detected more reliably. These four techniques have been evaluated using the TREC-6 spoken document retrieval test collection to determine the improvements in ret… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
32
0

Year Published

2000
2000
2017
2017

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 38 publications
(32 citation statements)
references
References 9 publications
0
32
0
Order By: Relevance
“…For this collection, another 2580 words were added to the pronunciation dictionary to transform all words, including query words, to phoneme sequences. In practice, this is done via an automated process [23]. Here, four versions of the transcriptions were used for retrieval:…”
Section: Sdr Collection From Trec-7mentioning
confidence: 99%
See 2 more Smart Citations
“…For this collection, another 2580 words were added to the pronunciation dictionary to transform all words, including query words, to phoneme sequences. In practice, this is done via an automated process [23]. Here, four versions of the transcriptions were used for retrieval:…”
Section: Sdr Collection From Trec-7mentioning
confidence: 99%
“…The rationale for this is that the incorrect transcription may be able to match relevant documents, which may also contain the incorrect transcription. This method is similar in concept to using a confusion matrix based approach by Wechsler [23] on the training collection, which can be used to determine which recognised phoneme is most likely to be recognised incorrectly as another. This technique, though not 100% accurate, is the only feasible approach for a larger collection.…”
Section: Experimental Questionsmentioning
confidence: 99%
See 1 more Smart Citation
“…Another work [24] suggests to guide word-embeddings with morphologically annotated data and shows achievement using German in a case study. Also, many papers study syllable-based n-gram methods to model language [25,26].…”
Section: Introductionmentioning
confidence: 99%
“…-spoken document retrieval, in which written queries are used to search speech (e.g., broadcast news audio) archives for relevant speech information [5,6,15,16,17,19,20], -speech-driven (spoken query) retrieval, in which spoken queries are used to retrieve relevant textual information [2,3].…”
Section: Introductionmentioning
confidence: 99%