“…Video recognition methods that use an attention mechanism [2,5,24,29,38,40,52,53,56,58,61,62,65] have also been proposed [6,10,18,46,56,59,70]. Non-local neural networks [56], which are commonly used for introducing an attention mechanism, improve the accuracy of video recognition by capturing long-distance temporal dependency with a non-local operation capable of providing global information.…”