Shogo Nagasaka scite author profile

Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. The main problem of this paper is to develop a computational model that can estimate language and acoustic models, and discover words directly from continuous human speech signals in an unsupervised manner. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner. The main contributions of this paper are as follows: (1) We develop a probabilistic generative model that integrates language and acoustic models, i.e., HDP-HLM. (2) We derive an inference method for this, and propose the NPB-DAA. (3) We show that the NPB-DAA can discover words directly from continuous human speech signals in an unsupervised manner.

show abstract

Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model

Araki

Nagaoka

Nagai

et al. 2012

View full text Add to dashboard Cite

In this paper, we propose an online algorithm for multimodal categorization based on the autonomously acquired multimodal information and partial words given by human users. For multimodal concept formation, multimodal latent Dirichlet allocation (MLDA) using Gibbs sampling is extended to an online version. We introduce a particle filter, which significantly improve the performance of the online MLDA, to keep tracking good models among various models with different parameters. We also introduce an unsupervised word segmentation method based on hierarchical Pitman-Yor Language Model (HPYLM). Since the HPYLM requires no predefined lexicon, we can make the robot system that learns concepts and words in completely unsupervised manner. The proposed algorithms are implemented on a real robot and tested using real everyday objects to show the validity of the proposed system.

show abstract

Double articulation analyzer for unsegmented human motion using Pitman-Yor language model and infinite hidden Markov model

Taniguchi

Nagasaka

2011

View full text Add to dashboard Cite

Unsupervised drive topic finding from driving behavioral data

Bando

Takenaka

Nagasaka

et al. 2013

View full text Add to dashboard Cite

Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals

et al. 2016

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Shogo Nagasaka

Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition From Continuous Speech Signals

Online learning of concepts and words using multimodal LDA and hierarchical Pitman-Yor Language Model

Double articulation analyzer for unsegmented human motion using Pitman-Yor language model and infinite hidden Markov model

Unsupervised drive topic finding from driving behavioral data

Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals

Contact Info

Product

Resources

About