“…By surveying experts and using the weighted mean of their evaluations, we limit the external threat of the evaluation of the difficulty to do elementary tasks. Moreover, the elementary tasks were obtained by decomposing repairing methods from multiple papers [7,14,10,5,16,8,4,2,11,18,1,13]. For the generation of errors, we generated them randomly by means of a uniform distribution in datasets and repeated the process 30 times to reduce bias.…”