Abstract:Motivated by a longitudinal oral health study, we evaluate the performance of binary Markov models in which the response variable is subject to an unconstrained misclassification process and follows a monotone or progressive behavior. Theoretical and empirical arguments show that the simple version of the model can be used to estimate the prevalence, incidences, and misclassification parameters without the need of external information and that the incidence estimators associated with the model outperformed app… Show more
“…It is important to stress that in a longitudinal setting, unlike cross sectional studies, the model parameters might be estimated without the use of external information about the misclassification parameters. For instance, García-Zattera et al (2010) showed that under simple restrictions on the parameter space, the model parameters associated with an inhomogeneous HMM for monotone responses are identified by the available data. They also proposed a univariate model to account for predictors allowing for irregularly spaced time intervals and different classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…Hidden Markov models (HMM) for the analysis of misclassified alternating longitudinal responses has been considered in the literature by Cook, Ng, and Meade (2000), Rosychuk and Thompson (2001), Rosychuk and Thompson (2003), Nagelkerke, Chunge, and Kinot (1990), and Rosychuk and Islam (2009), whereas Espeland, Murphy, and Leverett (1988), Espeland, Platt, and Gallagher (1989), Schmid, Segal, and Rosner (1994), Singh and Rao (1995), Albert, Hunsberger, and Biro (1997), and García-Zattera et al (2010) addressed the problem of misclassified monotone longitudinal responses. It is important to stress that in a longitudinal setting, unlike cross sectional studies, the model parameters might be estimated without the use of external information about the misclassification parameters.…”
Section: Introductionmentioning
confidence: 99%
“…In the context of longitudinal univariate categorical data, generalized linear mixed models (see, e.g., Neuhaus 2002), generalized estimating equation (GEE)-based approaches (see, e.g., Neuhaus 2002), and transition models (see, e.g., García-Zattera et al 2010) have been proposed for correcting for misclassification. Due to the monotone nature of our motivating problem and because the main scientific objective here is the incidence estimation, we restrict ourselves to the latter class of models, where the parameters have a direct interpretation in terms of the conditional probabilities of developing CE in a given time interval.…”
Motivated by a longitudinal oral health study, the Signal-Tandmobiel R study, we propose a multivariate binary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstrained misclassification process and have a monotone behavior. The multivariate baseline distributions and Markov transition matrices of the unobserved processes are defined as a function of covariates through the specification of compatible full conditional distributions. Distinct misclassification models are discussed. In all cases, the possibility that different examiners were involved in the scoring of the responses of a given subject across time is taken into account. A full Bayesian implementation of the model is described and its performance is evaluated using simulated data. We provide theoretical and empirical evidence that the parameters can be estimated without any external information about the misclassification parameters. Finally, the analyses of the motivating study are presented. Appendices 1-7 are available in the online supplementary materials.
“…It is important to stress that in a longitudinal setting, unlike cross sectional studies, the model parameters might be estimated without the use of external information about the misclassification parameters. For instance, García-Zattera et al (2010) showed that under simple restrictions on the parameter space, the model parameters associated with an inhomogeneous HMM for monotone responses are identified by the available data. They also proposed a univariate model to account for predictors allowing for irregularly spaced time intervals and different classifiers.…”
Section: Introductionmentioning
confidence: 99%
“…Hidden Markov models (HMM) for the analysis of misclassified alternating longitudinal responses has been considered in the literature by Cook, Ng, and Meade (2000), Rosychuk and Thompson (2001), Rosychuk and Thompson (2003), Nagelkerke, Chunge, and Kinot (1990), and Rosychuk and Islam (2009), whereas Espeland, Murphy, and Leverett (1988), Espeland, Platt, and Gallagher (1989), Schmid, Segal, and Rosner (1994), Singh and Rao (1995), Albert, Hunsberger, and Biro (1997), and García-Zattera et al (2010) addressed the problem of misclassified monotone longitudinal responses. It is important to stress that in a longitudinal setting, unlike cross sectional studies, the model parameters might be estimated without the use of external information about the misclassification parameters.…”
Section: Introductionmentioning
confidence: 99%
“…In the context of longitudinal univariate categorical data, generalized linear mixed models (see, e.g., Neuhaus 2002), generalized estimating equation (GEE)-based approaches (see, e.g., Neuhaus 2002), and transition models (see, e.g., García-Zattera et al 2010) have been proposed for correcting for misclassification. Due to the monotone nature of our motivating problem and because the main scientific objective here is the incidence estimation, we restrict ourselves to the latter class of models, where the parameters have a direct interpretation in terms of the conditional probabilities of developing CE in a given time interval.…”
Motivated by a longitudinal oral health study, the Signal-Tandmobiel R study, we propose a multivariate binary inhomogeneous Markov model in which unobserved correlated response variables are subject to an unconstrained misclassification process and have a monotone behavior. The multivariate baseline distributions and Markov transition matrices of the unobserved processes are defined as a function of covariates through the specification of compatible full conditional distributions. Distinct misclassification models are discussed. In all cases, the possibility that different examiners were involved in the scoring of the responses of a given subject across time is taken into account. A full Bayesian implementation of the model is described and its performance is evaluated using simulated data. We provide theoretical and empirical evidence that the parameters can be estimated without any external information about the misclassification parameters. Finally, the analyses of the motivating study are presented. Appendices 1-7 are available in the online supplementary materials.
“…Label uncertainty has commonly been found in clinical judgments due to expert subjectivity and inadequate information [104]. Often, it is handled as noise, so the task has been to detect and correct such mislabeling [87,107,45]. However, in the case of multiple, non-exclusive medical conditions [82], such as comorbidity, it makes more sense to treat labels with degrees of certainty rather than forcing them to belong to one "true" class, because there is no such thing as a single true class in this kind of scenario.…”
Section: Label Characteristicsmentioning
confidence: 99%
“…Such noise is usually regarded as mislabeling to be detected and corrected [87,107,45]. For example, Garca-Zattera et al employed binary Markov 2.2 DATA MINING ON HEALTHCARE DATA 25 models to estimate misclassification parameters for dental research [45].…”
Section: Classification With Label Uncertaintymentioning
The "big data" challenge is changing the way we acquire, store, analyse, and draw conclusions from data. How we effectively and efficiently "mine" the data from possibly multiple sources and extract useful information is a critical question. Increasing research attention has been drawn to healthcare data mining, with an ultimate goal to improve the quality of care. The human body is complex and so too the data collected in treating it. Data noise that is often introduced via the collection process makes building Data Mining models a challenging task.This thesis focuses on the classification tasks of mining healthcare data, with the goal of improving the effectiveness of health risk prediction. In particular, we developed algorithms to address issues identified from real healthcare data, such as feature extraction, heterogeneity, label uncertainty, and large unlabeled data.The three main contributions of this research are as follows. First, we developed a new health index called Personal Health Index (PHI) that scores a person's health status based on the examination records of a given population. Second, we identified the key characteristics of the real datasets and issues that were associated with the data. Third, we developed classification algorithms to cope with those issues, particularly, the label uncertainty and large unlabeled data issues.This research takes one step forward towards scoring personal health based on mining increasingly large health records. Particularly, it pioneers exploring the mining of GHE data and tackles the associated challenges. It is our anticipation that in the near future, more robust data-mining-based health scoring systems will be available for healthcare professionals to understand people's health status and thus improve the quality of care.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.