Korbinian Riedhammer scite author profile

In this paper we present an algorithm that produces pitch and probability-of-voicing estimates for use as features in automatic speech recognition systems. These features give large performance improvements on tonal languages for ASR systems, and even substantial improvements for non-tonal languages. Our method, which we are calling the Kaldi pitch tracker (because we are adding it to the Kaldi ASR toolkit), is a highly modified version of the getf0 (RAPT) algorithm. Unlike the original getf0 we do not make a hard decision whether any given frame is voiced or unvoiced; instead, we assign a pitch even to unvoiced frames while constraining the pitch trajectory to be continuous. Our algorithm also produces a quantity that can be used as a probability of voicing measure; it is based on the normalized autocorrelation measure that our pitch extractor uses. We present results on data from various languages in the BABEL project, and show a large improvement over systems without tonal features and systems where pitch and POV information was obtained from SAcC or getf0.

show abstract

Generating exact lattices in the WFST framework

Povey

et al. 2012

View full text Add to dashboard Cite

We describe a lattice generation method that is exact, i.e. it satisfies all the natural properties we would want from a lattice of alternative transcriptions of an utterance. This method does not introduce substantial overhead above one-best decoding. Our method is most directly applicable when using WFST decoders where the WFST is "fully expanded", i.e. where the arcs correspond to HMM transitions. It outputs lattices that include HMM-state-level alignments as well as word labels. The general idea is to create a state-level lattice during decoding, and to do a special form of determinization that retains only the best-scoring path for each word sequence. This special determinization algorithm is a solution to the following problem: Given a WFST A, compute a WFST B that, for each input-symbolsequence of A, contains just the lowest-cost path through A.

show abstract

Long story short – Global unsupervised models for keyphrase based meeting summarization

Riedhammer

Favre

Hakkani‐Tür

2010

Speech Communication

View full text Add to dashboard Cite

11We analyze and compare two different methods for unsupervised extractive spontaneous speech summarization in the meeting 12 domain. Based on utterance comparison, we introduce an optimal formulation for the widely used greedy maximum marginal relevance 13 (MMR) algorithm. Following the idea that information is spread over the utterances in form of concepts, we describe a system which 14 finds an optimal selection of utterances covering as many unique important concepts as possible. Both optimization problems are for-15 mulated as an integer linear program (ILP) and solved using public domain software. We analyze and discuss the performance of both 16 approaches using various evaluation setups on two well studied meeting corpora. We conclude on the benefits and drawbacks of the 17 presented models and give an outlook on future aspects to improve extractive meeting summarization.

show abstract

A global optimization framework for meeting summarization

Gillick

Riedhammer

Favre

et al. 2009

View full text Add to dashboard Cite

We introduce a model for extractive meeting summarization based on the hypothesis that utterances convey bits of information, or concepts. Using keyphrases as concepts weighted by frequency, and an integer linear program to determine the best set of utterances, that is, covering as many concepts as possible while satisfying a length constraint, we achieve ROUGE scores at least as good as a ROUGEbased oracle derived from human summaries. This brings us to a critical discussion of ROUGE and the future of extractive meeting summarization.

show abstract

The CALO Meeting Assistant System

Tür

Stolcke

Voss

et al. 2010

IEEE Trans. Audio Speech Lang. Process.

View full text Add to dashboard Cite

Abstract-The CALO Meeting Assistant (MA) provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper presents the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, topic identification and segmentation, question-answer pair identification, action item recognition, decision extraction, and summarization.

show abstract

The CALO meeting speech recognition and understanding system

Tür

Stolcke

Voss

et al. 2008

View full text Add to dashboard Cite

The CALO Meeting Assistant provides for distributed meeting capture, annotation, automatic transcription and semantic analysis of multiparty meetings, and is part of the larger CALO personal assistant system. This paper summarizes the CALO-MA architecture and its speech recognition and understanding components, which include real-time and offline speech transcription, dialog act segmentation and tagging, question-answer pair identification, action item recognition, decision extraction, and summarization.

show abstract

VoiceGuard: Secure and Private Speech Processing

Brasser

Frassetto

Riedhammer³

et al. 2018

View full text Add to dashboard Cite

With the advent of smart-home devices providing voice-based interfaces, such as Amazon Alexa or Apple Siri, voice data is constantly transferred to cloud services for automated speech recognition or speaker verification. While this development enables intriguing new applications, it also poses significant risks: Voice data is highly sensitive since it contains biometric information of the speaker as well as the spoken words. This data may be abused if not protected properly, thus the security and privacy of billions of end-users is at stake. We tackle this challenge by proposing an architecture, dubbed VoiceGuard, that efficiently protects the speech processing task inside a trusted execution environment (TEE). Our solution preserves the privacy of users while at the same time it does not require the service provider to reveal model parameters. Our architecture can be extended to enable user-specific models, such as feature transformations (including fMLLR), i-vectors, or model transformations (e.g., custom output layers). It also generalizes to secure on-premise solutions, allowing vendors to securely ship their models to customers. We provide a proof-of-concept implementation and evaluate it on the Resource Management and WSJ speech recognition tasks isolated with Intel SGX, a widely available TEE implementation, demonstrating even real time processing capabilities.

show abstract

Automatic Intelligibility Assessment of Speakers After Laryngeal Cancer by Means of Acoustic Modeling

et al. 2012

View full text Add to dashboard Cite

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.