“…Besides RAG(Lewis et al, 2020b), we consider several competitive prior methods incorporating knowledge including a) GPT2 + Knowledge(Guan et al, 2020), which post-trains GPT2 on triplets-converted augmentation data (e.g., ⟨helium, is, gas⟩ → "helium is gas. "); b) GPT2 + COMeT Embeddings, which fuses knowledge-aware embeddings (e.g., from CoMET(Bosselut et al, 2019); c) FiD-T5 (Izacard & Grave, 2021b;a), which concatenates retrieved grounding documents with context as new input; d) QA-GNN(Yasunaga et al, 2021), which traverses a graph neural network; e) Reflective decoding(West et al, 2021), which relies on forward and backward LMs to encode bi-directional context for generation.As shown in Table2(columns with 100% training data), KID outperforms all other methods in ELI5 (with 5.2 ROUGE-L points improvements over the second-best method (RAG)), and achieves competitive results requiring neither specific model architecture nor additional training to infuse knowledge.…”