“…Some methods [84,72,82,83] adopt the idea of global matching in the field of few-shot image classification [50,52] to carry out few-shot matching, which results in relatively poor performance because long-term temporal alignment information is ignored in the measurement process. To exploit the temporal cues, the following approaches [3,76,42,29,64,53,61,38,19,62,77] focuses on local frame-level (or segment-level) alignment between query and support videos. Among them, OTAM [3] proposes a variant of the dynamic time warping technique [37] to explicitly utilize the temporal ordering information in support-query video pairs.…”