“…• Temporal modeling • Insufficient detail Field [111], [90], [46], [104], [112], [88], [113], [18], global relationship [102], [94], [95], [47], [18], [49], [91], [96] • Intra-video diversity representation Inter-video [114], [115] • Representative • Complicated relationship category features training End-to-End [1], [34], [105], [36], [114], [116],…”