In this study we evaluate how to estimate diagnostic test accuracy (DTA) correctly in the presence of longitudinal patient data (i.e., repeated test applications per patient). We used a nonparametric approach to estimate sensitivity and specificity of diagnostic tests for three use cases with different characteristics (i.e., episode length and intervals between episodes): 1) systemic inflammatory response syndrome, 2) depression, and 3) epilepsy. DTA was estimated on the levels ‘time’, ‘event’, and ‘patient-time’ for each diagnosis, representing different research questions. A comparison of DTA for these levels per and across use cases showed variations in the estimates, which resulted from the used level, the time unit (i.e., per minute/hour/day), the resulting number of observations per patient, and the diagnosis-specific characteristics. Researchers need to predefine their choices (i.e., estimation levels and time units) based on their individual research aims, including the estimand definitions, and give an appropriate rationale considering the diagnosis-specific characteristics of the target outcomes and the number of observations per patient to make sure that unbiased and clinically relevant measures are communicated. Nonetheless, researchers could report the DTA of the test using more than one estimation level and/or time unit if this still complies with the research aim.