“…A compositor plays a fundamental role to integrate the textual information with the imagery modality. TGR compositors have been proposed based on various techniques, such as gating mechanism [49], hierarchical attention [7,23,12,20], graph neural network [54,44], joint learning [6,27,44,52,55], ensemble learning [50], style-content modification [29,5] and vision & language pre-training [32].…”