Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Miasnikof, Pierre; Giannakeas, Vasily; Gomes, Mireille; Aleksandrowicz, Lukasz; Shestopaloff, Alexander Y.; Alam, Dewan S.; Tollman, Stephen; Samarikhalaj, Akram; Jha, Prabhat

doi:10.1186/s12916-015-0521-2

Cited by 47 publications

(65 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…tuberculosis , which likely led to the exclusion of individuals with disseminated TB and limited or no respiratory symptoms. To date, all comparisons of VA to the PHMRC dataset, including those conducted by the PHMRC team, have combined the ‘AIDS’ and ‘AIDS with TB’ categories, and have therefore not attempted to assess VA’s ability to detect HIV-associated TB [19,20,30,60–64]. The PHMRC gold standard dataset nevertheless remains a valuable resource; we would suggest that any future validation exercises use the differentiated, ‘AIDS with TB’ and ‘AIDS’ categories, rather than the combined ‘AIDS’ category, for comparison to VA.…”

Section: Discussionmentioning

confidence: 99%

Measuring mortality due to HIV-associated tuberculosis among adults in South Africa: Comparing verbal autopsy, minimally-invasive autopsy, and research data

et al. 2017

View full text Add to dashboard Cite

BackgroundThe World Health Organization (WHO) aims to reduce tuberculosis (TB) deaths by 95% by 2035; tracking progress requires accurate measurement of TB mortality. International Classification of Diseases (ICD) codes do not differentiate between HIV-associated TB and HIV more generally. Verbal autopsy (VA) is used to estimate cause of death (CoD) patterns but has mostly been validated against a suboptimal gold standard for HIV and TB. This study, conducted among HIV-positive adults, aimed to estimate the accuracy of VA in ascertaining TB and HIV CoD when compared to a reference standard derived from a variety of clinical sources including, in some, minimally-invasive autopsy (MIA).Methods and findingsDecedents were enrolled into a trial of empirical TB treatment or a cohort exploring diagnostic algorithms for TB in South Africa. The WHO 2012 instrument was used; VA CoD were assigned using physician-certified VA (PCVA), InterVA-4, and SmartVA-Analyze. Reference CoD were assigned using MIA, research, and health facility data, as available. 259 VAs were completed: 147 (57%) decedents were female; median age was 39 (interquartile range [IQR] 33–47) years and CD4 count 51 (IQR 22–102) cells/μL. Compared to reference CoD that included MIA (n = 34), VA underestimated mortality due to HIV/AIDS (94% reference, 74% PCVA, 47% InterVA-4, and 41% SmartVA-Analyze; chance-corrected concordance [CCC] 0.71, 0.42, and 0.31, respectively) and HIV-associated TB (41% reference, 32% PCVA; CCC 0.23). For individual decedents, all VA methods agreed poorly with reference CoD that did not include MIA (n = 259; overall CCC 0.14, 0.06, and 0.15 for PCVA, InterVA-4, and SmartVA-Analyze); agreement was better at population level (cause-specific mortality fraction accuracy 0.78, 0.61, and 0.57, for the three methods, respectively).ConclusionsCurrent VA methods underestimate mortality due to HIV-associated TB. ICD and VA methods need modifications that allow for more specific evaluation of HIV-related deaths and direct estimation of mortality due to HIV-associated TB.

show abstract

Section: Discussionmentioning

confidence: 99%

Measuring mortality due to HIV-associated tuberculosis among adults in South Africa: Comparing verbal autopsy, minimally-invasive autopsy, and research data

et al. 2017

View full text Add to dashboard Cite

show abstract

“…The openVA R-package (Li et al, 2018) has made many of these algorithms publicly available. Generic classifiers like random forests (Breiman, 2001), naive Bayes classifiers (Minsky, 1961) and support vector machines (Cortes and Vapnik, 1995) have also been used Miasnikof et al, 2015;Koopman et al, 2015). Estimated COD labels for each VA record in a nationally representative VA database is aggregated to obtain national cause specific mortality fractions (CSMF) -the population-level class membership probabilities, that are often the main quantities of interest for epidemiologists, local governments, and global health organizations.…”

Section: Motivating Datasetmentioning

confidence: 99%

Regularized Bayesian transfer learning for population-level etiological distributions

et al. 2020

View full text Add to dashboard Cite

Computer-coded verbal autopsy (CCVA) algorithms predict cause of death from high-dimensional family questionnaire data (verbal autopsies) of a deceased individual. CCVA algorithms are typically trained on non-local data, then used to generate national and regional estimates of cause-specific mortality fractions. These estimates may be inaccurate if the non-local training data is different from the local population of interest. This problem is a special case of transfer learning which is now commonly deployed for classifying images, videos, texts, and other complex data. Most transfer learning classification approaches are concerned with individual (e.g. a person's) classification within a target domain (e.g. a particular population) with training performed in data from a source domain. Social and health scientists such as epidemiologists are often more interested with understanding etiological distributions at the population-level rather than classifying individuals. The sample sizes of their datasets are typically orders of magnitude smaller than those used for image classification and related tasks. We present a parsimonious hierarchical Bayesian transfer learning framework to directly estimate population-level class probabilities in a target domain, using any baseline classifier trained on source domain data, and a relatively smaller labeled target domain dataset. To address the small sample size issue, we introduce a novel shrinkage prior for the transfer error rates guaranteeing that, in absence of any labeled target domain data or when the baseline classifier is perfectly accurate, the domain-adapted (calibrated) estimate of class probabilities coincides with the naive estimates from the baseline classifier, thereby subsuming the default practice as a special case. A novel Gibbs sampler using data augmentation enables fast implementation. We then extend our approach to use not one, but an ensemble of baseline classifiers. Theoretical and empirical results demonstrate how the ensemble model favors the most accurate baseline classifier. Simulated and real data analyses reveal dramatic improvement in the estimates of class probabilities from our transfer learning approach. We also present extensions that allow the class probabilities to vary as functions of covariates, and an EMalgorithm-based MAP estimation as an alternate to MCMC. An R-package implementing this method for verbal autopsy data is available on Github.

show abstract

“…This extension allows us to complete our model to simultaneously estimate the latent correlation matrix and assign causes of death using VA data. Before we describe our model, it is worth noting that for many existing automated VA methods such as InSilicoVA (McCormick et al, 2016), InterVA (Byass et al, 2003), and the Naive Bayes Classifier (Miasnikof et al, 2015), the classification rule is closely related to the naive Bayes classifier under the assumption of (conditional) independence between symptoms, i.e.…”

Section: Full Posterior Sampling Stepsmentioning

confidence: 99%

“…The majority of the existing statistical or algorithmic methods to assign cause of death using VA surveys make the assumption that VA symptoms are independent from one another conditional on cause of death (Byass et al, 2003;James et al, 2011;Miasnikof et al, 2015;McCormick et al, 2016). This assumption simplifies computation and is efficient in settings with limited training data.…”

Section: Introductionmentioning

confidence: 99%

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies

Li¹,

McComick²,

Clark³

2020

Bayesian Anal.

View full text Add to dashboard Cite

Learning dependence relationships among variables of mixed types provides insights in a variety of scientific settings and is a well-studied problem in statistics. Existing methods, however, typically rely on copious, high quality data to accurately learn associations. In this paper, we develop a method for scientific settings where learning dependence structure is essential, but data are sparse and have a high fraction of missing values. Specifically, our work is motivated by survey-based cause of death assessments known as verbal autopsies (VAs). We propose a Bayesian approach to characterize dependence relationships using a latent Gaussian graphical model that incorporates informative priors on the marginal distributions of the variables. We demonstrate such information can improve estimation of the dependence structure, especially in settings with little training data. We show that our method can be integrated into existing probabilistic cause-of-death assignment algorithms and improves model performance while recovering dependence patterns between symptoms that can inform efficient questionnaire design in future data collection.

show abstract

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Cited by 47 publications

References 25 publications

Measuring mortality due to HIV-associated tuberculosis among adults in South Africa: Comparing verbal autopsy, minimally-invasive autopsy, and research data

Measuring mortality due to HIV-associated tuberculosis among adults in South Africa: Comparing verbal autopsy, minimally-invasive autopsy, and research data

Regularized Bayesian transfer learning for population-level etiological distributions

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies

Contact Info

Product

Resources

About