Extracting audio summaries to support effective spoken document search

Spina, Damiano; Trippas, Johanne R.; Cavedon, Lawrence; Sanderson, Mark

doi:10.1002/asi.23831

Cited by 32 publications

(15 citation statements)

References 43 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Therefore it may be helpful for the users to hear their query words in the context of the found document. For podcasts, this may mean that users listen to a snippet extracted from the podcast audio in order to understand the context of their query word [12].…”

Section: Discussionmentioning

confidence: 99%

Accessing Media Via an Audio-only Communication Channel: A Log Analysis

Trippas

Spina

Sanderson

et al. 2021

CUI 2021 - 3rd Conference on Conversational User Interfaces

Self Cite

View full text Add to dashboard Cite

Studies of interaction log analysis are a common tool to investigate behavioural data and contribute to insights into users' interaction patterns with a system [11,18]. We present a log analysis from a bespoke conversational system, RealSAM 1 , an audio-only interaction media assistant in which users can navigate and interact with media content through natural language. The novel assistant is designed for people with a vision impairment or other disability that prevents a person from accessing printed material. The exploratory analysis was conducted to provide an initial insight into the communication and interaction behaviours. We focus on understanding how users utilise the application. The results are twofold, we highlight the (i) implications for the design of future voice-enabled systems such as "infinite-reading" mode, enhanced interaction management enabling file navigation or time-compression techniques, and (ii) challenges of analysing conversational logs and suggest guidelines making these logs more accessible for future research. CCS CONCEPTS• Hardware → Emerging interfaces; • Human-centered computing → Natural language interfaces; • Information systems → Specialized information retrieval.

show abstract

Section: Discussionmentioning

confidence: 99%

Accessing Media Via an Audio-only Communication Channel: A Log Analysis

Trippas

Spina

Sanderson

et al. 2021

CUI 2021 - 3rd Conference on Conversational User Interfaces

Self Cite

View full text Add to dashboard Cite

show abstract

“…Spoken document summaries are also available for the AMI meeting corpus (Mccowan et al, 2005) and the ICSI meeting corpus (Janin et al, 2003), as well as corpora of lectures (Miller, 2019), and voicemail (Koumpis and Renals, 2005). Spina et al (2017) collect and evaluate 217 hours of podcasts for query-biased extractive summarization. In recent work, Tardy et al (2020) train a model to reproduce full-length manual reports aligned with automatic speech recognition transcripts of meetings, and Gholipour Ghalandari et al (2020) generate a corpus for multi-document summarization.…”

Section: Related Datasetsmentioning

confidence: 99%

100,000 Podcasts: A Spoken English Document Corpus

Clifton¹,

Reddy²,

Yu³

et al. 2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of paralinguistic, sociolinguistic, and acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks: (1) passage search and (2) summarization. This is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research.

show abstract

“…Similarly, the rise in popularity of spoken-text retrieval devices means that studying how searchers form queries after listening to an audio snippet will be useful. Spina et al [46] show that query-biased document summaries presented as audio are practical in conversational IR. Providing snippets through speech synthesis introduces more presentation factors, as Chuklin et al [18] note, where read-outs with prosody changes were subjectively more informative, at the expense of their aesthetic quality.…”

Section: Background and Motivationmentioning

confidence: 99%

CC-News-En

Mackenzie

Benham

Petri

et al. 2020

Proceedings of the 29th ACM International Conference on Information &Amp; Knowledge Management

Self Cite

View full text Add to dashboard Cite

We describe a static, open-access news corpus using data from the Common Crawl Foundation, who provide free, publicly available web archives, including a continuous crawl of international news articles published in multiple languages. Our derived corpus, CC-News-En, contains 44 million English documents collected between September 2016 and March 2018. The collection is comparable in size with the number of documents typically found in a single shard of a large-scale, distributed search engine, and is four times larger than the news collections previously used in offline information retrieval experiments. To complement the corpus, 173 topics were curated using titles from Reddit threads, forming a temporally representative sampling of relevant news topics over the 583 day collection window. Information needs were then generated using automatic summarization tools to produce textual and audio representations, and used to elicit query variations from crowdworkers, with a total of 10,437 queries collected against the 173 topics. Of these, 10,089 include key-stroke level instrumentation that captures the timings of character insertions and deletions made by the workers while typing their queries. These new resources support a wide variety of experiments, including large-scale efficiency exercises and query auto-completion synthesis, with scope for future addition of relevance judgments to support offline effectiveness experiments and hence batch evaluation campaigns.

show abstract

Extracting audio summaries to support effective spoken document search

Cited by 32 publications

References 43 publications

Accessing Media Via an Audio-only Communication Channel: A Log Analysis

Accessing Media Via an Audio-only Communication Channel: A Log Analysis

100,000 Podcasts: A Spoken English Document Corpus

CC-News-En

Contact Info

Product

Resources

About