Jouni Helske scite author profile

Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariates.The seqHMM package in R is designed for the efficient modeling of sequences and other categorical time series data containing one or multiple subjects with one or multiple interdependent sequences using HMMs and MHMMs. Also other restricted variants of the MHMM can be fitted, e.g., latent class models, Markov models, mixture Markov models, or even ordinary multinomial regression models with suitable parameterization of the HMM.Good graphical presentations of data and models are useful during the whole analysis process from the first glimpse at the data to model fitting and presentation of results. The package provides easy options for plotting parallel sequence data, and proposes visualizing HMMs as directed graphs.

show abstract

KFAS: Exponential Family State Space Models in R

Helske¹

2017

J. Stat. Soft.

View full text Add to dashboard Cite

State space modeling is an efficient and flexible method for statistical inference of a broad class of time series and other data. This paper describes the R package KFAS for state space modeling with the observations from an exponential family, namely Gaussian, Poisson, binomial, negative binomial and gamma distributions. After introducing the basic theory behind Gaussian and non-Gaussian state space models, an illustrative example of Poisson time series forecasting is provided. Finally, a comparison to alternative R packages suitable for non-Gaussian time series modeling is presented.

show abstract

Introducing libeemd: a program package for performing the ensemble empirical mode decomposition

2015

View full text Add to dashboard Cite

The ensemble empirical mode decomposition (EEMD) and its complete variant (CEEMDAN) are adaptive, noise-assisted data analysis methods that improve on the ordinary empirical mode decomposition (EMD). All these methods decompose possibly nonlinear and/or nonstationary time series data into a finite amount of components separated by instantaneous frequencies. This decomposition provides a powerful method to look into the different processes behind a given time series data, and provides a way to separate short time-scale events from a general trend.We present a free software implementation of EMD, EEMD and CEEMDAN and give an overview of the EMD methodology and the algorithms used in the decomposition. We release our implementation, libeemd, with the aim of providing a user-friendly, fast, stable, well-documented and easily extensible EEMD library for anyone interested in using (E)EMD in the analysis of time series data. While written in C for numerical efficiency, our implementation includes interfaces to the Python and R languages, and interfaces to other languages are straightforward.

show abstract

Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data

Helske

Eerola

2018

View full text Add to dashboard Cite

Longitudinal data often consists of multiple parallel sequences that ought to be analyzed jointly. For example, life course data may contain sequences of employment, family formation, and residence. Such data is often referred to as multichannel or multidimensional sequence data. A multichannel approach often gives a simpler representation of the data as opposed to combining states across life domains (the extended alphabet approach); the latter approach rapidly grows the state space as the number of channels and/or states grows. If some data is only partially observed, the multichannel approach also allows for handling data as it is instead of having to make difficult decisions on how to combine observed and unobserved states (Helske and Helske 2018). Joint analysis of complex multidimensional data poses several challenges. Multichannel sequence analysis (Gauthier et al. 2010) has been the standard tool for the analysis of multichannel sequence data (for empirical applications see, e.g.,

show abstract

Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo

Vihola

Helske

Franks

2020

Scandinavian J Statistics

View full text Add to dashboard Cite

We consider importance sampling (IS) type weighted estimators based on Markov chain Monte Carlo (MCMC) targeting an approximate marginal of the target distribution. In the context of Bayesian latent variable models, the MCMC typically operates on the hyperparameters, and the subsequent weighting may be based on IS or sequential Monte Carlo (SMC), but allows for multilevel techniques as well. The IS approach provides a natural alternative to delayed acceptance (DA) pseudo-marginal/particle MCMC, and has many advantages over DA, including a straightforward parallelization and additional flexibility in MCMC implementation. We detail minimal conditions which ensure strong consistency of the suggested estimators, and provide central limit theorems with expressions for asymptotic variances. We demonstrate how our method can make use of SMC in the state space models context, using Laplace approximations and time-discretized diffusions. Our experimental results are promising and show that the IS-type approach can provide substantial

show abstract

Can Visualization Alleviate Dichotomous Thinking? Effects of Visual Representations on the Cliff Effect

Helske

Cooper

et al. 2021

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

show abstract

A Bayesian Reconstruction of a Historical Population in Finland, 1647–1850

Voutilainen

Helske

Högmander

2020

View full text Add to dashboard Cite

This article provides a novel method for estimating historical population development. We review the previous literature on historical population time-series estimates and propose a general outline to address the well-known methodological problems. We use a Bayesian hierarchical time-series model that allows us to integrate the parish-level data set and prior population information in a coherent manner. The procedure provides us with modelbased posterior intervals for the final population estimates. We demonstrate its applicability by estimating the long-term development of Finland's population from 1647 onward and simultaneously place the country among the very few to have an annual population series of such length available.

show abstract

Estimation of Causal Effects with Small Data in the Presence of Trapdoor Variables

Helske

Tikka

Karvanen

2021

View full text Add to dashboard Cite

show abstract

12 3 4

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jouni Helske

Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R

KFAS: Exponential Family State Space Models in R

Introducing libeemd: a program package for performing the ensemble empirical mode decomposition

Combining Sequence Analysis and Hidden Markov Models in the Analysis of Complex Life Sequence Data

Importance sampling type estimators based on approximate marginal Markov chain Monte Carlo

Can Visualization Alleviate Dichotomous Thinking? Effects of Visual Representations on the Cliff Effect

A Bayesian Reconstruction of a Historical Population in Finland, 1647–1850

Estimation of Causal Effects with Small Data in the Presence of Trapdoor Variables

Contact Info

Product

Resources

About