Abstract:Abstract. Sparked by the need to inform the response to the spread of HIV/AIDS in drug injecting populations in the 1980s and the desire to base local, national and international responses to tackling drug use in the 1990s on solid epidemiological data, the mark-recapture method has increasingly been used to estimate the prevalence of drug use. Richard Cormack provided support and advice to some of the first United Kingdom and European studies to estimate drug use prevalence in this way. The approach he outlin… Show more
“…PSE based on the method known by the monikers "capture-recapture" and the "multiplier method" is a statistically principled approach which has been widely used to estimate the sizes of KPs [10][11][12][13][14][15][16][17][18][19][20]. Such multiple-list PSE is commonly based on only two lists, but three-or-more-list estimation [21][22][23][24] is becoming increasingly common.…”
Estimates of the sizes of key populations (KPs) affected by HIV, including men who have sex with men, female sex workers and people who inject drugs, are required for targeting epidemic control efforts where they are most needed. Unfortunately, different estimators often produce discrepant results, and an objective basis for choice is lacking. This simulation study provides the first comparison of information-theoretic selection of loglinear models (LLM-AIC), Bayesian model averaging of loglinear models (LLM-BMA) and Bayesian nonparametric latent-class modeling (BLCM) for estimation of population size from multiple lists. Four hundred random samples from populations of size 1,000, 10,000 and 20,000, each including five encounter opportunities, were independently simulated using each of 30 data-generating models obtained from combinations of six patterns of variation in encounter probabilities and five expected per-list encounter probabilities, producing a total of 36,000 samples. Population size was estimated for each combination of sample and sequentially cumulative sets of 2–5 lists using LLM-AIC, LLM-BMA and BLCM. LLM-BMA and BLCM were quite robust and performed comparably in terms of root mean-squared error and bias, and outperformed LLM-AIC. All estimation methods produced uncertainty intervals which failed to achieve the nominal coverage, but LLM-BMA, as implemented in the dga R package produced the best balance of accuracy and interval coverage. The results also indicate that two-list estimation is unnecessarily vulnerable, and it is better to estimate the sizes of KPs based on at least three lists.
“…PSE based on the method known by the monikers "capture-recapture" and the "multiplier method" is a statistically principled approach which has been widely used to estimate the sizes of KPs [10][11][12][13][14][15][16][17][18][19][20]. Such multiple-list PSE is commonly based on only two lists, but three-or-more-list estimation [21][22][23][24] is becoming increasingly common.…”
Estimates of the sizes of key populations (KPs) affected by HIV, including men who have sex with men, female sex workers and people who inject drugs, are required for targeting epidemic control efforts where they are most needed. Unfortunately, different estimators often produce discrepant results, and an objective basis for choice is lacking. This simulation study provides the first comparison of information-theoretic selection of loglinear models (LLM-AIC), Bayesian model averaging of loglinear models (LLM-BMA) and Bayesian nonparametric latent-class modeling (BLCM) for estimation of population size from multiple lists. Four hundred random samples from populations of size 1,000, 10,000 and 20,000, each including five encounter opportunities, were independently simulated using each of 30 data-generating models obtained from combinations of six patterns of variation in encounter probabilities and five expected per-list encounter probabilities, producing a total of 36,000 samples. Population size was estimated for each combination of sample and sequentially cumulative sets of 2–5 lists using LLM-AIC, LLM-BMA and BLCM. LLM-BMA and BLCM were quite robust and performed comparably in terms of root mean-squared error and bias, and outperformed LLM-AIC. All estimation methods produced uncertainty intervals which failed to achieve the nominal coverage, but LLM-BMA, as implemented in the dga R package produced the best balance of accuracy and interval coverage. The results also indicate that two-list estimation is unnecessarily vulnerable, and it is better to estimate the sizes of KPs based on at least three lists.
“…Log-linear models are routinely used with independence assumptions in CRC estimation [25,38,39], and are accepted as one of the most useful representations of count data [40]. This is the most frequently used method for CRC in social sciences [41][42][43][44]. For each subset i, let m i " ErN i s where expectation is defined with respect to the sampling design for the k samples.…”
Section: Log-linear Modelmentioning
confidence: 99%
“…Because injection drug use is often stigmatized or legally criminalized, it can be difficult to conduct a systematic survey of PWID [57,58]. Instead, indirect estimation techniques like CRC are recommended [41,58,59]. The purpose of the original study Figure 2: Illustration of data from three semi-overlapping samples of people who inject drugs in Brussels, Belgium [56].…”
Capture-recapture (CRC) surveys are widely used to estimate the size of a population whose members cannot be enumerated directly. When k capture samples are obtained, counts of unit captures in subsets of samples are represented naturally by a 2 k contingency table in which one element -the number of individuals appearing in none of the samples -remains unobserved. In the absence of additional assumptions, the population size is not point-identified. Assumptions about independence between samples are often used to achieve point-identification. However, real-world CRC surveys often use convenience samples in which independence cannot be guaranteed, and population size estimates under independence assumptions may lack empirical credibility. In this work, we apply the theory of partial identification to show that weak assumptions or qualitative knowledge about the nature of dependence between samples can be used to characterize a non-trivial set in which the true population size lies with high probability. We construct confidence sets for the population size under bounds on pairwise capture probabilities, and bounds on the highest order interaction term in a log-linear model using two methods: test inversion bootstrap confidence intervals, and profile likelihood confidence intervals. We apply these methods to recent survey data to estimate the number of people who inject drugs in Brussels, Belgium.
“…wileyonlinelibrary.com/journal/biom 1 extent of drug abuse, see Overstall et al (2014), Farcomeni and Scacciatelli (2013), Huggins et al (2016), Hay and Richardson (2016), and references therein. Our final estimate for the number of drug dealers can be of interest to both law and public health authorities.…”
We introduce a time-interaction point process where the occurrence of an event can increase (self-excitement) or reduce (self-correction) the probability of future events. Self-excitement and self-correction are allowed to be triggered by the same event, at different timescales; other effects such as those of covariates, unobserved heterogeneity, and temporal dependence are also allowed in the model. We focus on capture-recapture data, as our work is motivated by an original example about the estimation of the total number of drug dealers in Italy. To do so, we derive a conditional likelihood formulation where only subjects with at least one capture are involved in the inference process. The result is a novel and flexible continuous-time population size estimator. A simulation study and the analysis of our motivating example illustrate the validity of our approach in several scenarios.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.