SummaryWhen comparing sensitivities and specificities from multiple diagnostic tests, particularly in biomedical research, the different test kits under study are applied to groups of subjects with the same disease status for a disease or medical condition under consideration. Although this process gives rise to clustered or correlated test outcomes, the associated inference issues are well recognized and have been widely discussed in the literature. In mental health and psychosocial research, sensitivity and specificity have also been widely used to study the reliability of instrument for diagnosing mental health and psychiatric conditions and assessing certain behavioral patterns. However, unlike biomedical applications, outcomes are often obtained under varying reference standards or different diagnostic criteria, precluding the application of existing methods for comparing multiple diagnostic tests to such a research setting. In this paper, we develop a new approach to address these problems (including that of missing data) by extending recent work on inference using inverse probability weighted estimates. The approach is illustrated with data from two studies in sexual abuse and health research as well as a limited simulation study, with the latter used to study the performance of the proposed procedure.