Supervector Compression Strategies to Speed up I-Vector System Development

Vestman, Ville; Kinnunen, Tomi

doi:10.21437/odyssey.2018-50

Cited by 4 publications

(3 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The UBM is a 1024-component Gaussian mixture model (GMM) [10], which is used to compute sufficient statistics for ivector extraction. We compute 800-dimensional i-vectors by compressing mean supervectors of maximum a posteriori (MAP) adapted GMMs using probabilistic principal component analysis (PPCA) as described in [5]. This is a (speed-wise) high-performing alternative to the stardard i-vector extraction that is traditionally done via front-end factor analysis [11,12].…”

Section: Speaker Identification System Descriptionmentioning

confidence: 99%

“…The i-vector extraction using PPCA is simply a matter of compressing 61440-dimensional GMM-supervector to 800-dimensional space using a precomputed projection matrix. Note that the traditional approach for i-vector extraction would, in addition, require inverting an 800 × 800 posterior covariance matrix [14,5].…”

Section: Without Replay Channelmentioning

confidence: 99%

“…We run our demo on a web platform that can be used on PCs and mobile phones with an internet connection to ensure good accessibility of the demo. The web platform communicates with a computation server that runs the speaker recognition back end based on our recent work on computationally efficient i-vector extraction [5]. The back end provides the results to the web platform that displays them by using embedded YouTube video players.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Who Do I Sound like? Showcasing Speaker Recognition Technology by Youtube Voice Search

Vestman

Soomro

Kanervisto

et al. 2019

ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Self Cite

View full text Add to dashboard Cite

The popularization of science can often be disregarded by scientists as it may be challenging to put highly sophisticated research into words that general public can understand. This work aims to help presenting speaker recognition research to public by proposing a publicly appealing concept for showcasing recognition systems. We leverage data from YouTube and use it in a large-scale voice search web application that finds the celebrity voices that best match to the user's voice. The concept was tested in a public event as well as "in the wild" and the received feedback was mostly positive. The i-vector based speaker identification back end was found to be fast (665 ms per request) and had a high identification accuracy (93%) for the YouTube target speakers. To help other researchers to develop the idea further, we share the source codes of the web platform used for the demo at https://github.com/bilalsoomro/ speech-demo-platform.

show abstract

Section: Speaker Identification System Descriptionmentioning

confidence: 99%

Section: Without Replay Channelmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation