“…Chest X-ray Report Generation Inspired by the success of deep learning models on image captioning, a lot of encoder-decoder based frameworks have been proposed (Jing et al, 2018(Jing et al, , 2019Liu et al, 2021bLiu et al, ,a, 2019cYuan et al, 2019;Xue et al, 2018;Li et al, 2018Zhang et al, 2020a;Kurisinkel et al, 2021;Ni et al, 2020;Nishino et al, 2020;Chen et al, 2020c;Wang et al, 2021;Boag et al, 2019;Syeda-Mahmood et al, 2020;Yang et al, 2020;Lovelace and Mortazavi, 2020;Zhang et al, 2020b;Miura et al, 2021). Specifically, Jing et al (2018) proposed a hierarchical LSTM with the attention mechanism (Bahdanau et al, 2015b;You et al, 2016).…”