Improving Action Segmentation via Graph-Based Temporal Reasoning

Huang, Yifei; Sugano, Yusuke; Sato, Yoichi

doi:10.1109/cvpr42600.2020.01404

Cited by 97 publications

(51 citation statements)

References 55 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Recently, models that use graph convolutional networks have shown promise. Examples include those proposed by Zeng et al (2019) [44] and Huang et al (2020) [244].…”

Section: ) Temporal Action Localization/detection Modelsmentioning

confidence: 99%

Video Action Understanding

Hutchinson¹,

Gadepally²

2021

IEEE Access

View full text Add to dashboard Cite

Many believe that the successes of deep learning on image understanding problems can be replicated in the realm of video understanding. However, due to the scale and temporal nature of video, the span of video understanding problems and the set of proposed deep learning solutions is arguably wider and more diverse than those of their 2D image siblings. Finding, identifying, and predicting actions are a few of the most salient tasks in this emerging and rapidly evolving field. With a pedagogical emphasis, this tutorial introduces and systematizes fundamental topics, basic concepts, and notable examples in supervised video action understanding. Specifically, we clarify a taxonomy of action problems, catalog and highlight video datasets, describe common video data preparation methods, present the building blocks of state-of-the-art deep learning model architectures, and formalize domain-specific metrics to baseline proposed solutions. This tutorial is intended to be accessible to a general computer science audience and assumes a conceptual understanding of supervised learning.

show abstract

“…Recently, models that use graph convolutional networks have shown promise. Examples include those proposed by Zeng et al (2019) [44] and Huang et al (2020) [244].…”

Section: ) Temporal Action Localization/detection Modelsmentioning

confidence: 99%

Video Action Understanding

Hutchinson¹,

Gadepally²

2021

IEEE Access

View full text Add to dashboard Cite

show abstract

“…In addition, there exist several approaches to improve the performance of action segmentation models such as MS-TCN [3,26,9,10]. Chen et al [3] proposed to apply selfsupervised domain adaptation techniques when training a model such as MS-TCN, and it exploits unlabeled videos to boost the performance of action segmentation.…”

Section: Related Workmentioning

confidence: 99%

“…Chen et al [3] proposed to apply selfsupervised domain adaptation techniques when training a model such as MS-TCN, and it exploits unlabeled videos to boost the performance of action segmentation. Wang et al [26] suggested a framework named boundary-aware cascade network (BCN), and Yifei et al [9] suggested a graphbased temporal reasoning module (GTRM). These [26,9] can be easily attached to various action segmentation models to improve performance.…”

Section: Related Workmentioning

confidence: 99%

Refining Action Segmentation with Hierarchical Video Representations

Ahn

Lee

2021

2021 IEEE/CVF International Conference on Computer Vision (ICCV)

View full text Add to dashboard Cite

In this paper, we propose Hierarchical Action Segmentation Refiner (HASR), which can refine temporal action segmentation results from various models by understanding the overall context of a given video in a hierarchical way. When a backbone model for action segmentation estimates how the given video can be segmented, our model extracts segment-level representations based on frame-level features, and extracts a video-level representation based on the segment-level representations. Based on these hierarchical representations, our model can refer to the overall context of the entire video, and predict how the segment labels that are out of context should be corrected. Our HASR can be plugged into various action segmentation models (MS-TCN, SSTDA, ASRF), and improve the performance of state-of-the-art models based on three challenging datasets (GTEA, 50Salads, and Breakfast). For example, in 50Salads dataset, the segmental edit score improves from 67.9% to 77.4% (MS-TCN), from 75.8% to 77.3% (SSTDA), from 79.3% to 81.0% (ASRF). In addition, our model can refine the segmentation result from the unseen backbone model, which was not referred to when training HASR. This generalization performance would make HASR be an effective tool for boosting up the existing approaches for temporal action segmentation. Our code is available at https: //github.com/cotton-ahn/HASR_iccv2021.

show abstract

“…NS-CL builds an object-based scene representation and translates sentences into symbolic programs, allowing question and answering about the elements of the scene. Another work used a graph-based network module to detect action in videos, by reasoning over the temporal relations present in each video [16].…”

Section: Neuro-symbolic Modelsmentioning

confidence: 99%

Towards Neural-Symbolic AI for Media Understanding

Costa

Marques

Serra

et al. 2020

Anais Estendidos Do XXVI Simpósio Brasileiro De Sistemas Multimídia E Web (WebMedia 2020)

View full text Add to dashboard Cite

Methods based on Machine Learning have become state-of-the-art in various segments of computing, especially in the fields of computer vision, speech recognition, and natural language processing. Such methods, however, generally work best when applied to specific tasks in specific domains where large training datasets are available. This paper presents an overview of the state-of-the-art in the area of Deep Learning for Multimedia Content Analysis (image, audio, and video), and describe recent works that propose The integration of deep learning with symbolic AI reasoning. We draw a picture of the future by discussing envisaged use cases that address media understanding gaps which can be solved by the integration of machine learning and symbolic AI, the so-called Neuro-Symbolic integration.

show abstract

Improving Action Segmentation via Graph-Based Temporal Reasoning

Cited by 97 publications

References 55 publications

Video Action Understanding

Video Action Understanding

Refining Action Segmentation with Hierarchical Video Representations

Towards Neural-Symbolic AI for Media Understanding

Contact Info

Product

Resources

About