Increasing amounts of informal spoken content are being collected. This material does not have clearly defined document forms either in terms of structure or topical content, e.g. recordings of meetings, lectures and personal data sources. Automated search of this content poses challenges beyond retrieval of defined documents, including definition of search items and location of relevant content within them. While most existing work on speech search focused on clearly defined document units, in this paper we describe our initial investigation into search of meeting content using the AMI meeting collection. Manual and automated transcripts of meetings are first automatically segmented into topical units. A known-item search task is then performed using presentation slides from the meetings as search queries to locate relevant sections of the meetings. Query slides were selected corresponding to well recognised and poorly recognised spoken content, and randomly selected slides. Experimental results show that relevant items can be located with reasonable accuracy using a standard information retrieval approach, and that there is a clear relationship between automatic transcription accuracy and retrieval effectiveness.
Abstract. We describe Dublin City University (DCU)'s participation in the VideoCLEF 2009 Linking Task. Two approaches were implemented using the Lemur information retrieval toolkit. Both approaches first extracted a search query from the transcriptions of the Dutch TV broadcasts. One method first performed search on a Dutch Wikipedia archive, then followed links to corresponding pages in the English Wikipedia. The other method first translated the extracted query using machine translation and then searched the English Wikipedia collection directly. We found that using the original Dutch transcription query for searching the Dutch Wikipedia yielded better results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.