“…In a similar trend, 'expert' modules have been added to (non-modular) pre-trained LMs post-hoc, predominantly referred to as adapters (Rebuffi et al, 2017(Rebuffi et al, , 2018Houlsby et al, 2019). Next to being extremely parameter (Houlsby et al, 2019;Mahabadi et al, 2021a;He et al, 2022) and training efficient (Pfeiffer et al, 2020a;, these modular approaches allow models to be extended to new data settings (Chen et al, 2019;, where newly learned knowledge can be combined (Stickland and Murray, 2019;Wang et al, 2021a;Pfeiffer et al, 2021a;Lauscher et al, 2020a;Mahabadi et al, 2021b;Poth et al, 2021), or stacked for combinatory cross-lingual (Pfeiffer et al, 2020bÜstün et al, 2020;Vidoni et al, 2020;Ansell et al, 2021b,a;Wang et al, 2021b) as well as NMT scenarios (Bapna and Firat, 2019;Philip et al, 2020;Chronopoulou et al, 2020;Le et al, 2021;Üstün et al, 2021;Stickland et al, 2021;Garcia et al, 2021).…”