Concentração de nitrogênio na solução nutritiva e número de frutos sobre a qualidade de frutos de melão

In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our model is a nested hierarchical Pitman-Yor language model, where Pitman-Yor spelling model is embedded in the word model. We confirmed that it significantly outperforms previous reported results in both phonetic transcripts and standard datasets for Chinese and Japanese word segmentation. Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any "word" indications.

show abstract

Survey on frontiers of language and robotics

Tangiuchi

Mochihashi

Nagai

et al. 2019

Advanced Robotics

View full text Add to dashboard Cite

Gibbs sampling based Multi-scale Mixture Model for speaker clustering

Watanabe

Mochihashi

Hori

et al. 2011

View full text Add to dashboard Cite

The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mix ture Model (M 3 ). The proposed approach focuses on the multi-scale property of speech dynamics, Le., dynamics in speech can be ob served on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales. M 3 is an extension of the Gaus sian mixture model and is considered a hierarchical mixture model, where mixture components in each time scale will change at inter vals of the corresponding time unit. We derive a fully Bayesian treat ment of the multi-scale mixture model based on Gibbs sampling. The advantage of the proposed model is that each speaker cluster can be precisely modeled based on the Gaussian mixture model unlike conventional single-Gaussian based speaker clustering (e.g., using the Bayesian Information Criterion (BIC)). In addition, Gibbs sam pling offers the potential to avoid a serious local optimum problem. Speaker clustering experiments confirmed these advantages and ob tained a significant improvement over the conventional BIC based approaches.

show abstract

Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models

Uchiumi¹,

Tsukahara²,

Mochihashi³

2015

View full text Add to dashboard Cite

We propose a nonparametric Bayesian model for joint unsupervised word segmentation and part-of-speech tagging from raw strings. Extending a previous model for word segmentation, our model is called a Pitman-Yor Hidden Semi-Markov Model (PYHSMM) and considered as a method to build a class n-gram language model directly from strings, while integrating character and word level information. Experimental results on standard datasets on Japanese, Chinese and Thai revealed it outperforms previous results to yield the state-of-the-art accuracies. This model will also serve to analyze a structure of a language whose words are not identified a priori.

show abstract

Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov models

et al. 2016

View full text Add to dashboard Cite

Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process

Nagano

Nagaoka

Nagai

et al. 2018

View full text Add to dashboard Cite

Segmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes

et al. 2017

View full text Add to dashboard Cite

Humans divide perceived continuous information into segments to facilitate recognition. For example, humans can segment speech waves into recognizable morphemes. Analogously, continuous motions are segmented into recognizable unit actions. People can divide continuous information into segments without using explicit segment points. This capacity for unsupervised segmentation is also useful for robots, because it enables them to flexibly learn languages, gestures, and actions. In this paper, we propose a Gaussian process-hidden semi-Markov model (GP-HSMM) that can divide continuous time series data into segments in an unsupervised manner. Our proposed method consists of a generative model based on the hidden semi-Markov model (HSMM), the emission distributions of which are Gaussian processes (GPs). Continuous time series data is generated by connecting segments generated by the GP. Segmentation can be achieved by using forward filtering-backward sampling to estimate the model's parameters, including the lengths and classes of the segments. In an experiment using the CMU motion capture dataset, we tested GP-HSMM with motion capture data containing simple exercise motions; the results of this experiment showed that the proposed GP-HSMM was comparable with other methods. We also conducted an experiment using karate motion capture data, which is more complex than exercise motion capture data; in this experiment, the segmentation accuracy of GP-HSMM was 0.92, which outperformed other methods.

show abstract

HVGH: Unsupervised Segmentation for High-Dimensional Time Series Using Deep Neural Compression and Statistical Generative Model

et al. 2019

View full text Add to dashboard Cite

Humans perceive continuous high-dimensional information by dividing it into meaningful segments, such as words and units of motion. We believe that such unsupervised segmentation is also important for robots to learn topics such as language and motion. To this end, we previously proposed a hierarchical Dirichlet process-Gaussian process-hidden semi-Markov model (HDP-GP-HSMM). However, an important drawback of this model is that it cannot divide high-dimensional time-series data. Furthermore, low-dimensional features must be extracted in advance. Segmentation largely depends on the design of features, and it is difficult to design effective features, especially in the case of high-dimensional data. To overcome this problem, this study proposes a hierarchical Dirichlet process-variational autoencoder-Gaussian process-hidden semi-Markov model (HVGH). The parameters of the proposed HVGH are estimated through a mutual learning loop of the variational autoencoder and our previously proposed HDP-GP-HSMM. Hence, HVGH can extract features from high-dimensional time-series data while simultaneously dividing it into segments in an unsupervised manner. In an experiment, we used various motion-capture data to demonstrate that our proposed model estimates the correct number of classes and more accurate segments than baseline methods. Moreover, we show that the proposed method can learn latent space suitable for segmentation.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Daichi Mochihashi

Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling

Survey on frontiers of language and robotics

Gibbs sampling based Multi-scale Mixture Model for speaker clustering

Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models

Learning word meanings and grammar for verbalization of daily life activities using multilayered multimodal latent Dirichlet allocation and Bayesian hidden Markov models

Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process

Segmenting Continuous Motions with Hidden Semi-markov Models and Gaussian Processes

HVGH: Unsupervised Segmentation for High-Dimensional Time Series Using Deep Neural Compression and Statistical Generative Model

Contact Info

Product

Resources

About