Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. The main problem of this paper is to develop a computational model that can estimate language and acoustic models, and discover words directly from continuous human speech signals in an unsupervised manner. For this purpose, we propose an integrative generative model that combines a language model and an acoustic model into a single generative model called the "hierarchical Dirichlet process hidden language model" (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson et al. An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of language and acoustic models from continuous speech signals. Based on the HDP-HLM and its inference procedure, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire language and acoustic models from observed continuous speech signals. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer (DAA) and baseline automatic speech recognition system whose acoustic model was trained in a supervised manner. The main contributions of this paper are as follows: (1) We develop a probabilistic generative model that integrates language and acoustic models, i.e., HDP-HLM. (2) We derive an inference method for this, and propose the NPB-DAA. (3) We show that the NPB-DAA can discover words directly from continuous human speech signals in an unsupervised manner.
In this paper, we propose an online algorithm for multimodal categorization based on the autonomously acquired multimodal information and partial words given by human users. For multimodal concept formation, multimodal latent Dirichlet allocation (MLDA) using Gibbs sampling is extended to an online version. We introduce a particle filter, which significantly improve the performance of the online MLDA, to keep tracking good models among various models with different parameters. We also introduce an unsupervised word segmentation method based on hierarchical Pitman-Yor Language Model (HPYLM). Since the HPYLM requires no predefined lexicon, we can make the robot system that learns concepts and words in completely unsupervised manner. The proposed algorithms are implemented on a real robot and tested using real everyday objects to show the validity of the proposed system.
An unsupervised learning method, called double articulation analyzer with temporal prediction (DAA-TP), is proposed on the basis of the original DAA model. The method will enable future advanced driving assistance systems to determine driving context and predict possible scenarios of driving behavior by segmenting and modeling incoming driving-behavior time series data. In previous studies, we applied the DAA model to driving-behavior data and argued that contextual changing points can be estimated as changing points of chunks. A sequence prediction method, which predicts the next hidden state sequence, was also proposed in a previous study. However, the original DAA model does not model the duration of chunks of driving behavior and is not able to do a temporal prediction of the scenarios. Our DAA-TP method explicitly models the duration of chunks of driving behavior on the assumption that driving-behavior data have a two-layered hierarchical structure, i.e., double articulation structure. For this purpose, the hierarchical Dirichlet process hidden semi-Markov model is used for explicitly modeling the duration of segments of driving-behavior data. A Poisson distribution is also used to model the duration distribution of driving-behavior segments. The duration distribution of chunks of driving-behavior data is also theoretically calculated using the reproductive property of the Poisson distribution. We also propose a calculation method for obtaining the probability distribution of the remaining duration of current driving words as a mixture of Poisson distribution with a theoretical approximation for unobserved driving words. This method can calculate the posterior probability distribution of the next termination time of chunks by explicitly modeling all probable chunking results for observed data. The DAA-TP was applied to a synthetic data set having a double articulation structure to evaluate its model consistency. To evaluate the effectiveness of DAA-TP, we applied it to a drivingbehavior data set recorded at actual factory circuits. The DAA-TP could predict the next termination time of chunks more accurately than the compared methods. We also report the qualitative results for understanding the potential capability of DAA-TP.
In this paper, we propose a novel semiotic prediction method for driving behavior based on double articulation structure. It has been reported that predicting driving behavior from its multivariate time series behavior data by using machine learning methods, e.g., hybrid dynamical system, hidden Markov model and Gaussian mixture model, is difficult because a driver's behavior is affected by various contextual information. To overcome this problem, we assume that contextual information has a double articulation structure and develop a novel semiotic prediction method by extending nonparametric Bayesian unsupervised morphological analyzer. Effectiveness of our prediction method was evaluated using synthetic data and real driving data. In these experiments, the proposed method achieved long-term prediction 2-6 times longer than some conventional methods.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.