Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Miasnikof, Pierre; Giannakeas, Vasily; Gomes, Mireille; Aleksandrowicz, Lukasz; Shestopaloff, Alexander Y.; Alam, Dewan S.; Tollman, Stephen; Samarikhalaj, Akram; Jha, Prabhat

doi:10.32920/14639652.v1

Cited by 8 publications

(18 citation statements)

References 15 publications

(25 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of observations within India, Mexico, Philippines, and Tanzania are 2973, 1586, 1259, and 2023, respectively. To address countryspecific dataset shift, for each country, we used the three remaining countries as training data for four methods commonly used for cause of death predictions: InterVA (Byass et al, 2012), InSilicoVA (McCormick et al, 2016, NBC (Miasnikof et al, 2015), and Tariff (Serina et al, 2015). The first three methods are probabilistic, while Tariff produces a score for each cause that needed to be normalized to be in [0, 1].…”

Section: Phmrc Dataset Analysismentioning

confidence: 99%

“…However, in some applications, the objective is not individual level predictions, but rather to learn about population-level distributions of a given outcome. Examples include sentiment analysis for Twitter users (Giachanou and Crestani, 2016), estimating the prevalence of chronic fatigue syndrome (Valdez et al, 2018), and cause of death distribution estimation from verbal autopsies (King et al, 2008;McCormick et al, 2016;Serina et al, 2015;Byass et al, 2012;Miasnikof et al, 2015).…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Generalized Bayes Quantification Learning under Dataset Shift

Fiksel¹,

Datta²,

Amouzou³

et al. 2020

Preprint

View full text Add to dashboard Cite

Quantification learning is the task of prevalence estimation for a test population using predictions from a classifier trained on a different population. Commonly used quantification methods either assume perfect sensitivity and specificity of the classifier, or use the training data to both train the classifier and also estimate its misclassification rates. These methods are inappropriate in the presence of dataset shift, when the misclassification rates in the training population are not representative of those for the test population. A recent Bayesian quantification model addresses dataset shift, but only allows for single-class (categorical) predictions, and assumes perfect knowledge of the true labels on a small number of instances from the test population. We propose a generalized Bayesian quantification learning (GBQL) approach that uses the entire compositional predictions from probabilistic classifiers and allows for uncertainty in true class labels for the limited labeled test data. Instead of positing a full model, we use a model-free Bayesian estimating equation approach to compositional data using Kullback-Liebler loss-functions based only on a first-moment assumption. The idea will be useful in Bayesian compositional data analysis in general as it is robust to different generating mechanisms for compositional data and allows 0's and 1's in the compositional outputs thereby including categorical outputs as a special case. For the quantification problem, this estimating equation approach coherently links the loss-functions for labeled and unlabeled test cases. We show how our method yields existing quantification approaches as special cases through different prior choices thereby providing an inferential framework around these approaches. This observation also enables using shrinkage towards these approaches via priors which stabilizes estimation in data-scarce settings. Extension to an ensemble GBQL that uses predictions from multiple classifiers yielding inference robust to inclusion of a poor classifier is discussed. We outline a fast and efficient Gibbs sampler using a rounding and coarsening approximation to the loss functions. For large sample settings, we establish posterior consistency of GBQL, which to our knowledge is the first result on consistency of a quantification approach in presence of local labeled data. Empirical performance of GBQL is demonstrated through simulations and analysis of real data with evident dataset shift.

show abstract

Section: Phmrc Dataset Analysismentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Generalized Bayes Quantification Learning under Dataset Shift

Fiksel¹,

Datta²,

Amouzou³

et al. 2020

Preprint

View full text Add to dashboard Cite

show abstract

“…[15] Physician-coding of VA (PCVA) is subject to physician-specific bias, occupies valuable physician time, is expensive, and often leads to long delays between the VA interview and assigning a cause of death. Over the past decade computational algorithms [16,17,18,19,20,21,22,23] have been developed and are increasingly used to interpret VA data and assign causes to VA deaths. Algorithms are consistent, essentially cost-free (compared to the traditional use of physician time to assign cause of death to VA), and can be run on large numbers of deaths quickly.…”

Section: Verbal Autopsy In Civil Registration and Vital Statisticsmentioning

confidence: 99%

“…In some cases SCI take the form of deaths with symptoms and causes assigned through another mechanism, and in other cases, the relationships are solicited directly from experts. There are six VA-coding algorithms that have been proposed and/or used widely: (i) InterVA, [26,20,27,28,29,30] (ii) Tariff, [21] (iii) a derivative of Tariff called SmartVA-Analyze, [18,19] (iv) InSilicoVA, [16,31] (v) a naive Bayes classifier called NBC, [17] and (vi) the King-Lu algorithm. [22,23] The list of causes that each algorithm assigns varies slightly.…”

Section: Verbal Autopsy In Civil Registration and Vital Statisticsmentioning

confidence: 99%

Verbal Autopsy in Civil Registration and Vital Statistics: The Symptom-Cause Information Archive

Clark¹,

Bratschi²,

Setel³

et al. 2019

Preprint

View full text Add to dashboard Cite

“…Existing computer-coded VA algorithms include those for which the relationship between symptoms and cause of death is encoded by experts (InterVA [Byass et al, 2012] and InSilicoVA [McCormick et al, 2016]) and those for which it is learned by relying on a labeled subset of the data having known COD (the King and Lu method [King et al, 2008], the Tariff method [James et al, 2011], the Simplified Symptom Pattern method [Murray et al, 2011a], the naive Bayes classifier [Miasnikof et al, 2015], the Bayesian factor model [Kunihama et al, 2018], and latent Gaussian graphical model [Li et al, 2018b]).…”

Section: Introductionmentioning

confidence: 99%

Bayesian Hierarchical Factor Regression Models to Infer Cause of Death From Verbal Autopsy Data

Moran,

Turner,

Dunson

et al. 2019

Preprint

View full text Add to dashboard Cite

In low-resource settings where vital registration of death is not routine it is often of critical interest to determine and study the cause of death (COD) for individuals and the cause-specific mortality fraction (CSMF) for populations. Post-mortem autopsies, considered the gold standard for COD assignment, are often difficult or impossible to implement due to deaths occurring outside the hospital, expense, and/or cultural norms. For this reason, Verbal Autopsies (VAs) are commonly conducted, consisting of a questionnaire administered to next of kin recording demographic information, known medical conditions, symptoms, and other factors for the decedent. This article proposes a novel class of hierarchical factor regression models that avoid restrictive assumptions of standard methods, allow both the mean and covariance to vary with COD category, and can include covariate information on the decedent, region, or events surrounding death. Taking a Bayesian approach to inference, this work develops an MCMC algorithm and validates the FActor Regression for Verbal Autopsy (FARVA) model in simulation experiments. An application of FARVA to real VA data shows improved goodness-of-fit and better predictive performance in inferring COD and CSMF over competing methods. Code and a user manual are made available at https://github.com/kelrenmor/farva.

show abstract

Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths

Cited by 8 publications

References 15 publications

Generalized Bayes Quantification Learning under Dataset Shift

Generalized Bayes Quantification Learning under Dataset Shift

Verbal Autopsy in Civil Registration and Vital Statistics: The Symptom-Cause Information Archive

Bayesian Hierarchical Factor Regression Models to Infer Cause of Death From Verbal Autopsy Data

Contact Info

Product

Resources

About