To assess the meaningfulness of an intervention or policy effect on students’ achievement, researchers may apply empirical benchmarks as standards for comparisons, involving normative expectations for students’ academic growth as well as performance gaps between weak and average schools or policy-relevant groups (e.g., male and female students, students from socioeconomically advantaged or disadvantaged families, students with or without a migration background). Previous research offered these empirical benchmarks by drawing on student samples from the United States. How well these results generalize to student populations in other countries is an open question. We therefore provide novel meta-analytic evidence on these empirical benchmarks for students attending elementary and secondary schools in Germany for a broad variety of achievement outcomes (e.g., mathematics, science, information and communication technology, first and second language skills). Drawing on the results obtained for large, representative probability samples, we observed variations in each kind of benchmark across countries as well as across domains and student subpopulations within Germany. This pattern of results underscores that the assessment of the very same intervention effect may depend on the target population and outcome of the intervention. We conclude by illustrating and discussing the strengths and limitations of empirical benchmarks for assessing the magnitude of intervention effects.