The proposed use of equivalence tests instead of difference tests in the revised guidance on the risk assessment of plant protection products for bees is a reasonable approach given an adverse effect was observed in the lower tier studies, using the hypothesis that there is a risk as the null hypothesis places the burden to prove the opposite on the other side. However, some uncertainties regarding the application of equivalence tests in field studies are discussed in the present study. Here, we compare equivalence and difference testing methods using a control dataset of a honey bee field effect study conducted in northern Germany in 2014. Half of the 48 colonies were assigned to a hypothetical test item group, and the colony strength data were analyzed using t‐tests, a generalized linear mixed model (GLMM), and the corresponding equivalence tests. The data reflected the natural variability of honey bee colonies, with initially approximately 12 000 adult bees. Although the t‐test and GLMM confirmed that 24 + 24 colonies are sufficient to show “no adverse effect,” the equivalence tests of the t‐test and GLMM were not able to reject the null hypothesis and classified at least some of the assessments as “high risk,” indicating a power that was too low. Based on this, different operating options to reduce the variability are discussed. One possible option, which may provide a more realistic application of equivalence to avoid false high risk, is to consider the lower confidence interval of the control as a baseline and use GLMMs. With this option, we demonstrate a relatively acceptable probability to prove that no high risk for initially similar groups can be achieved. Further studies with different numbers of colonies are still needed to develop and validate the suggested approach. Integr Environ Assess Manag 2024;00:1–8. © 2024 SETAC