In this work, we deal with correlated under-reported data through INAR(1)-hidden Markov chain models. These models are very flexible and can be identified through its autocorrelation function, which has a very simple form. A naïve method of parameter estimation is proposed, jointly with the maximum likelihood method based on a revised version of the forward algorithm. The most-probable unobserved time series is reconstructed by means of the Viterbi algorithm. Several examples of application in the field of public health are discussed illustrating the utility of the models. Copyright © 2016 John Wiley & Sons, Ltd.
Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.
Underreporting in gender-based violence data is a worldwide problem leading to the underestimation of the magnitude of this social and public health concern. This problem deteriorates the data quality, providing poor and biased results that lead society to misunderstand the actual scope of this domestic violence issue. The present work proposes time series models for underreported counts based on a latent integer autoregressive of order 1 time series with Poisson distributed innovations and a latent underreporting binary state, that is, a first-order Markov chain. Relevant theoretical properties of the models are derived, and the moment-based and maximum-based methods are presented for parameter estimation. The new time series models are applied to the quarterly complaints of domestic violence against women recorded in some judicial districts of Galicia (Spain) between 2007 and 2017. The models allow quantifying the degree of underreporting. A comprehensive discussion is presented, studying how the frequency and intensity of underreporting in this public health concern are related to some interesting socioeconomic and health indicators of the provinces of Galicia (Spain). KEYWORDSinteger autoregressive models, intimate partner violence, public health, state-dependent underreporting, underrecorded data 4404
In this article we present a new INteger-valued AutoRegressive (INAR) model with the aim of extracting baseline patterns of cattle fallen stock registered over an 5-year period at a local scale. We introduce HINAR as a generalization of the classical Poisson-based INAR models whose innovations follow a Hermite distribution. In order to assess trends and seasonality in these time series, we fit different models with time-dependent parameters by specifying proper functions. Using real world examples, we illustrate how to estimate parameters by maximum likelihood and validate the fitted models. We also show a detailed method to forecast. Our proposed model supposes a good solution for studying discrete time series when the counts have many zeros, low counts and moderate overdispersion. This model has been applied to the analysis of fallen cattle registered at a local scale as part of the development of a veterinary syndromic surveillance system.
The present paper introduces a new model used to study and analyse the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) epidemic-reported-data from Spain. This is a Hidden Markov Model whose hidden layer is a regeneration process with Poisson immigration, Po-INAR(1), together with a mechanism that allows the estimation of the under-reporting in non-stationary count time series. A novelty of the model is that the expectation of the unobserved process’s innovations is a time-dependent function defined in such a way that information about the spread of an epidemic, as modelled through a Susceptible-Infectious-Removed dynamical system, is incorporated into the model. In addition, the parameter controlling the intensity of the under-reporting is also made to vary with time to adjust to possible seasonality or trend in the data. Maximum likelihood methods are used to estimate the parameters of the model.
Middle East respiratory syndrome coronavirus (MERS-CoV) remains a notable disease and poses a significant threat to global public health. The Arabian Peninsula is considered a major global epicentre for the disease and the virus has crossed regional and continental boundaries since 2012. In this study, we focused on exploring the temporal dynamics of MERS-CoV in human populations in the Arabian Peninsula between 2012 and 2017, using publicly available data on case counts and combining two analytical methods. Disease progression was assessed by quantifying the time-dependent reproductive number (TD-Rs), while case series temporal pattern was modelled using the AutoRegressive Integrated Moving Average (ARIMA). We accounted for geographical variability between three major affected regions in Saudi Arabia including Eastern Province, Riyadh and Makkah. In Saudi Arabia, the epidemic size was large with TD-Rs >1, indicating significant spread until 2017. In both Makkah and Riyadh regions, the epidemic progression reached its peak in April 2014 (TD-Rs > 7), during the highest incidence period of MERS-CoV cases. In Eastern Province, one unique super-spreading event (TD-R > 10) was identified in May 2013, which comprised of the most notable cases of human-to-human transmission. Best-fitting ARIMA model inferred statistically significant biannual seasonality in Riyadh region, a region characterised by heavy seasonal camel-related activities. However, no statistical evidence of seasonality was identified in Eastern Province and Makkah. Instead, both areas were marked by an endemic pattern of cases with sporadic outbreaks. Our study suggested new insights into the epidemiology of the virus, including inferences about epidemic progression and evidence for seasonality. Despite the inherent limitations of the available data, our conclusions provide further guidance to currently implement risk-based surveillance in high-risk populations and, subsequently, improve related interventions strategies against the epidemic at country and regional levels.
The goal in biological dosimetry is to estimate the dose of radiation that a suspected irradiated individual has received. For that, the analysis of aberrations (most commonly dicentric chromosome aberrations) in scored cells is performed and dose response calibration curves are built. In whole body irradiation (WBI) with X- and gamma-rays, the number of aberrations in samples is properly described by the Poisson distribution, although in partial body irradiation (PBI) the excess of zeros provided by the non-irradiated cells leads, for instance, to the Zero-Inflated Poisson distribution. Different methods are used to analyse the dosimetry data taking into account the distribution of the sample. In order to test the Poisson distribution against the Zero-Inflated Poisson distribution, several asymptotic and exact methods have been proposed which are focused on the dispersion of the data. In this work, we suggest an exact test for the Poisson distribution focused on the zero-inflation of the data developed by Rao and Chakravarti (Some small sample tests of significance for a Poisson distribution. Biometrics 1956; 12 : 264-82.), derived from the problems of occupancy. An approximation based on the standard Normal distribution is proposed in those cases where the computation of the exact test can be tedious. A Monte Carlo Simulation study was performed in order to estimate empirical confidence levels and powers of the exact test and other tests proposed in the literature. Different examples of applications based on in vitro data and also data recorded in several radiation accidents are presented and discussed. A Shiny application which computes the exact test and other interesting goodness-of-fit tests for the Poisson distribution is presented in order to provide them to all interested researchers.
Background Genital warts are a common and highly contagious sexually transmitted disease. They have a large economic burden and affect several aspects of quality of life. Incidence data underestimate the real occurrence of genital warts because this infection is often under-reported, mostly due to their specific characteristics such as the asymptomatic course. Methods Genital warts cases for the analysis were obtained from the Catalan public health system database (SIDIAP) for the period 2009-2016. People under 15 and over 94 years old were excluded from the analysis as the incidence of genital warts in this population is negligible. This work introduces a time series model based on a mixture of two distributions, capable of detecting the presence of under-reporting in the data. In order to identify potential differences in the magnitude of the under-reporting issue depending on sex and age, these covariates were included in the model. Results This work shows that only about 80% in average of genital warts incidence in Catalunya in the period 2009-2016 was registered, although the frequency of under-reporting has been decreasing over the study period. It can also be seen that this issue has a deeper impact on women over 30 years old. Conclusions Although this study shows that the quality of the registered data has improved over the considered period of time, the Catalan public health system is underestimating genital warts real burden in almost 10,000 cases, around 23% of the registered cases. The total annual cost is underestimated in about 10 million Euros respect the 54 million Euros annually devoted to genital warts in Catalunya, representing 0.4% of the total budget.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.