Respondent-Driven Sampling (RDS) is n approach to sampling design and inference in hard-to-reach human populations. It is often used in situations where the target population is rare and/or stigmatized in the larger population, so that it is prohibitively expensive to contact them through the available frames. Common examples include injecting drug users, men who have sex with men, and female sex workers. Most analysis of RDS data has focused on estimating aggregate characteristics, such as disease prevalence. However, RDS is often conducted in settings where the population size is unknown and of great independent interest. This paper presents an approach to estimating the size of a target population based on data collected through RDS. The proposed approach uses a successive sampling approximation to RDS to leverage information in the ordered sequence of observed personal network sizes. The inference uses the Bayesian framework, allowing for the incorporation of prior knowledge. A flexible class of priors for the population size is used that aids elicitation. An extensive simulation study provides insight into the performance of the method for estimating population size under a broad range of conditions. A further study shows the approach also improves estimation of aggregate characteristics. Finally, the method demonstrates sensible results when used to estimate the size of known networked populations from the National Longitudinal Study of Adolescent Health, and when used to estimate the size of a hard-to-reach population at high risk for HIV.
Alternative strategies for two-sample cross-validation of covariance structure models are described and investigated. The strategies vary according to whether all (tight strategy) or some (partial strategy) of the model parameters are held constant when a calibration sample solution is re-fit to a validation sample covariance matrix. Justification is provided for three partial strategies. Conventional and alternative strategies for cross-validation are discussed as methods for evaluating overall discrepancy of a model fit to a particular sample, where overall discrepancy arises from the combined influences of discrepancy of approximation and discrepancy of estimation (Cudeck & Henly, 1991). Results of a sampling study using empirical data show that for tighter strategies simpler models are preferred in smaller samples. However, when partial cross-validation is employed, a more complex model may be supported even in a small sample. Implications for model comparison and evaluation, as well as the issues of model complexity and sample size are discussed.
Moderated regression analysis is commonly used to test for multiplicative influences of independent variables in regression models. D. Lubinski and L. G. Humphreys (1990) have shown that significant moderator effects can exist even when stronger quadratic effects are present. They recommend comparing effect sizes associated with both effect types and selecting the model that yields the strongest effect. The authors show that this procedure of comparing effect sizes is biased in favor of the moderated model when multicollinearity is high because of the differential reliability of the quadratic and multiplicative terms in the regression models. Fortunately, levels of multicollinearity under which this bias is most problematic may be outside the range encountered in many empirical studies. The authors discuss causes and implications of this phenomenon as well as alternative procedures for evaluating structural relationships among variables.
Summary The study of hard-to-reach populations presents significant challenges. Typically, a sampling frame is not available, and population members are difficult to identify or recruit from broader sampling frames. This is especially true of populations at high risk for HIV/AIDS. Respondent-driven sampling (RDS) is often used in such settings with the primary goal of estimating the prevalence of infection. In such populations, the number of people at risk for infection and the number of people infected are of fundamental importance. This article presents a case-study of the estimation of the size of the hard-to-reach population based on data collected through RDS. We study two populations of female sex workers and men-who-have-sex-with-men in El Salvador. The approach is Bayesian and we consider different forms of prior information, including using the UNAIDS population size guidelines for this region. We show that the method is able to quantify the amount of information on population size available in RDS samples. As separate validation, we compare our results to those estimated by extrapolating from a capture–recapture study of El Salvadorian cities. The results of our case-study are largely comparable to those of the capture–recapture study when they differ from the UNAIDS guidelines. Our method is widely applicable to data from RDS studies and we provide a software package to facilitate this.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.