“…With the goal of creating general models that generalize compositionally in a large range of tasks, in this paper we explore the design space of Transformer models, showing that several design decisions, such as position encodings, decoder type, weight sharing, model hyper-parameters, and formulation of the target task result in different inductive biases, with significant impact for compositional generalization.In order to evaluate the different design decisions, we use a collection of twelve datasets designed ro measure compositional generalization. In addition to six standard datasets commonly used in the literature (such as SCAN (Lake and Baroni, 2018), PCFG (Hupkes et al, 2020), CFQ (Keysers et al, 2019) and COGS (Kim and Linzen, 2020)), we also use a set of basic algorithmic tasks (such as addition, duplication, or set intersection) that although not directly involving natural language, are useful to obtain insights into what can and cannot be learned with different Transformer models.…”