“…For linear models, promoting interpretability essentially corresponds to reducing the number of features [43,53,54,63]. For decision trees and decision rules, besides reducing the number of features, approaches exist to restrict model size, prune unnecessary parts [5,26], aggregate local models in a hierarchy [48], or promote a trade-off between accuracy and complexity by means of loss functions [30,52] or prior distributions [33,60,61]. Regarding GP (and close relatives like grammatical evolution), perhaps the most simple and popular strategy to favor interpretability is to restrain the number of model components [16,31,57], sometimes in elaborate ways or particular settings [6,32,40,49,56].…”