Detection of Important Scenes in Baseball Videos via a Time-Lag-Aware Multimodal Variational Autoencoder

Hirasawa, Kaito; Maeda, Kazuo; Ogawa, Takahiro; Haseyama, Miki

doi:10.3390/s21062045

Cited by 2 publications

(7 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The encoder, decoder, and important scene predictor consisted of multiple fully connected layers, respectively. Unlike the conventional method [ 14 ] which trains a part of the encoder considering time lags, Tl-LVM can train the entire network considering time lags based on the new loss function. Therefore, this figure does not represent a time-lag aware transformation in the encoder as in the figure of the conventional method.…”

Section: Figurementioning

confidence: 99%

“…The method in [ 22 ] uses a multimodal variational autoencoder (MVAE) [ 24 ] using tweets and visual information as the latent variable model (LVM) to detect fake news. The effectiveness of using LVM to consider both videos and tweets has also been reported in [ 14 , 16 ]. The MVAE can discover correlations between modalities by learning shared representations.…”

Section: Introductionmentioning

confidence: 98%

“…Twitter allows users to receive and post short text messages called tweets . Thus, using features extracted from both videos and tweets posted by viewers has improved the prediction and detection performance of important scenes in sports videos [ 10 , 11 , 12 , 13 , 14 , 15 , 16 , 17 ].…”

Section: Introductionmentioning

confidence: 99%

“…For constructing MVAE-based methods using tweets and videos, the following problem is considered: viewers watching baseball videos post tweets inspired by several previous events, such as an RBI hit and home run. Thus, since tweets and multiple corresponding events have time lags, the conventional methods [ 14 , 16 ] detect important scenes through the consideration of tweets and events. Specifically, the conventional method [ 14 ] detects important scenes via a time-lag-aware multimodal variational autoencoder that has the encoder considering the time lags between tweets and events, and the conventional method [ 16 ] detects them based on a generative adversarial network-based approach using features transformed via bidirectional time-lag aware deep multiset canonical correlation analysis.…”

Section: Introductionmentioning

confidence: 99%

“…Thus, since tweets and multiple corresponding events have time lags, the conventional methods [ 14 , 16 ] detect important scenes through the consideration of tweets and events. Specifically, the conventional method [ 14 ] detects important scenes via a time-lag-aware multimodal variational autoencoder that has the encoder considering the time lags between tweets and events, and the conventional method [ 16 ] detects them based on a generative adversarial network-based approach using features transformed via bidirectional time-lag aware deep multiset canonical correlation analysis. However, these methods cannot simultaneously train a time-lag aware feature transformation network and network for the detection of important scenes.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Time-Lag Aware Latent Variable Model for Prediction of Important Scenes Using Baseball Videos and Tweets

Hirasawa

Maeda

Ogawa

et al. 2022

Sensors

Self Cite

View full text Add to dashboard Cite

In this study, a novel prediction method for predicting important scenes in baseball videos using a time-lag aware latent variable model (Tl-LVM) is proposed. Tl-LVM adopts a multimodal variational autoencoder using tweets and videos as the latent variable model. It calculates the latent features from these tweets and videos and predicts important scenes using these latent features. Since time lags exist between posted tweets and events, Tl-LVM introduces the loss considering time lags by correlating the feature into the loss function of the multimodal variational autoencoder. Furthermore, Tl-LVM can train the encoder, decoder, and important scene predictor, simultaneously, using this loss function. This is the novelty of Tl-LVM, and this work is the first end-to-end prediction model of important scenes that considers time lags to the best of our knowledge. It is the contribution of Tl-LVM to realize high-quality prediction using latent features that consider time lags between tweets and multiple corresponding previous events. Experimental results using actual tweets and baseball videos show the effectiveness of Tl-LVM.

show abstract

Section: Figurementioning

confidence: 99%

Section: Introductionmentioning

confidence: 98%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Time-Lag Aware Latent Variable Model for Prediction of Important Scenes Using Baseball Videos and Tweets

Hirasawa

Maeda

Ogawa

et al. 2022

Sensors

Self Cite

View full text Add to dashboard Cite

show abstract

Cross-Modality Interaction-Based Traffic Accident Classification

Oh,

Ban

2024

Applied Sciences

View full text Add to dashboard Cite

Traffic accidents on the road lead to serious personal and material damage. Furthermore, preventing secondary accidents caused by traffic accidents is crucial. As various technologies for detecting traffic accidents in videos using deep learning are being researched, this paper proposes a method to classify accident videos based on a video highlight detection network. To utilize video highlight detection for traffic accident classification, we generate information using the existing traffic accident videos. Moreover, we introduce the Car Crash Highlights Dataset (CCHD). This dataset contains a variety of weather conditions, such as snow, rain, and clear skies, as well as multiple types of traffic accidents. We compare and analyze the performance of various video highlight detection networks in traffic accident detection, thereby presenting an efficient video feature extraction method according to the accident and the optimal video highlight detection network. For the first time, we have applied video highlight detection networks to the task of traffic accident classification. In the task, the most superior video highlight detection network achieves a classification performance of up to 79.26% when using video, audio, and text as inputs, compared to using video and text alone. Moreover, we elaborated the analysis of our approach in the aspects of cross-modality interaction, self-attention and cross-attention, feature extraction, and negative loss.

show abstract

Detection of Important Scenes in Baseball Videos via a Time-Lag-Aware Multimodal Variational Autoencoder

Cited by 2 publications

References 20 publications

Time-Lag Aware Latent Variable Model for Prediction of Important Scenes Using Baseball Videos and Tweets

Time-Lag Aware Latent Variable Model for Prediction of Important Scenes Using Baseball Videos and Tweets

Cross-Modality Interaction-Based Traffic Accident Classification

Contact Info

Product

Resources

About