“…Some progress has already been made in this transfer learning setup, e.g., GenBERT (Geva et al, 2020), finetuned on a synthetic dataset of arithmetic problems, is found to score higher on DROP QA. Similarly, DICE (Sundararaman et al, 2020), optimized for numeration, improves score on Numeracy600K order-of-magnitude prediction task. Going forward, we need several such studies, ideally for each pair of tasks to see whether some numeracy skills help models generalize to others.…”