Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects

Silva, Michel M.; Ramos, Washington; Cadar, Felipe; Ferreira, João Pedro Klock; Campos, Mário F. M.; Nascimento, Erickson R.

doi:10.1016/j.jvcir.2018.02.013

Cited by 14 publications

(35 citation statements)

References 22 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this section, we present the quantitative results of the experimental evaluation of the proposed method. We compare it with the methods: EgoSampling (ES) [22], Stabilized Semantic Fast-Forward (SSFF) [27], Microsoft Hyperlapse (MSH) [7] the state-of-the-art method in terms of visual smoothness, and Multi-Importance Fast-Forward (MIFF) [26] the state-of-the-art method in terms of the amount of semantics retained in the final video. Figure 4-a shows the results of the Semantic evaluation performed using the sequences in the Semantic Dataset.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

“…The rates are calculated such that the semantic segments are played slower than the nonsemantic ones, and the whole video achieves the desired speed-up. We refer the reader to [26] for a more detailed description of the multi-importance semantic segmentation.…”

Section: Temporal Semantic Profile Segmentationmentioning

confidence: 99%

“…The quantitative analysis presented in this work is based on three aspects: instability, speed-up, and amount of semantic information retained in the fastforward video. The Instability index is measured by using the cumulative sum over the standard deviation of pixels in a sliding window over the video [26]. The Speed-up metric is given by de difference of the achieved speed-up rate from the required value.…”

Section: Datasets and Evaluation Criterionmentioning

confidence: 99%

See 2 more Smart Citations

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Silva

Ramos

Ferreira

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

View full text Add to dashboard Cite

Thanks to the advances in the technology of low-cost digital cameras and the popularity of the self-recording culture, the amount of visual data on the Internet is going to the opposite side of the available time and patience of the users. Thus, most of the uploaded videos are doomed to be forgotten and unwatched in a computer folder or website. In this work, we address the problem of creating smooth fast-forward videos without losing the relevant content. We present a new adaptive frame selection formulated as a weighted minimum reconstruction problem, which combined with a smoothing frame transition method accelerates first-person videos emphasizing the relevant segments and avoids visual discontinuities. The experiments show that our method is able to fast-forward videos to retain as much relevant information and smoothness as the state-ofthe-art techniques in less time. We also present a new 80hour multimodal (RGB-D, IMU, and GPS) dataset of firstperson videos with annotations for recorder profile, frame scene, activities, interaction, and attention 1 .

show abstract

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Section: Temporal Semantic Profile Segmentationmentioning

confidence: 99%

Section: Datasets and Evaluation Criterionmentioning

confidence: 99%

See 1 more Smart Citation

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Silva

Ramos

Ferreira

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

View full text Add to dashboard Cite

show abstract

“…In this work, we proposed the CoolNet, a Convolutional Neural Network that learns the preference of the user from visual data of frames of YouTube videos in the YouTube8M dataset [29] and their statistics (number of views, likes, and dislikes). The readers is referred to our work [30] to details about the dataset creation, training routines, and model accuracy. The created semantic profile is used for segmenting the input video into sequences of different levels of semantic, and to compute speed-up rates such that it slows down the video portions according with their semantic load.…”

Section: A Temporal Semantic Profile Segmentationmentioning

confidence: 99%

Semantic Hyperlapse: a Sparse Coding-based and Multi-Importance Approach for First-Person Videos

Silva

Campos

2019

Anais Estendidos Do XXXII Conference on Graphics, Patterns and Images (SIBRAPI Estendido 2019)

Self Cite

View full text Add to dashboard Cite

The availability of low-cost, high-quality personal wearable cameras combined with the unlimited storage capacity of video-sharing websites has evoked a growing interest in First-Person Videos (FPVs). Such videos are usually composed of long-running unedited streams captured by a device attached to the user body, which makes them tedious and visually unpleasant to watch. Consequently, there is a rise in the need to provide quick access to the information therein. To address this need, efforts have been applied to the development of techniques such as Hyperlapse and Semantic Hyperlapse, which aims to create visually pleasant shorter videos and emphasize semantic portions of the video, respectively. The state-of-the-art Semantic Hyperlapse method SSFF, negligees the level of importance of the relevant information, by only evaluating if it is significant or not. Other limitations of SSFF are the number of input parameters, the scalability in the number of visual features to describe the frames, and the abrupt change in the speed-up rate of consecutive video segments. In this dissertation, we propose a parameter-free Sparse Coding based methodology to adaptively fast-forward First-Person Videos, that emphasize the semantic portions applying a multi-importance approach. Experimental evaluations show that the proposed method creates shorter version video retaining more semantic information, with fewer abrupt transitions of speed-up rates, and more stable final videos than the output of SSFF. Visual results and graphical explanation of the methodology can be visualized through the link: https://youtu.be/8uStih8P5-Y.

show abstract

“…The training process has used only restricted amount of task-specific training data. Jain et al [46,47] used CNN features for visual detection tasks, for example, object localization, scene identification, and classification. Alom et al [40] used cellular simultaneous recurrent networks (CSRNs) for feature extraction.…”

Section: Introductionmentioning

confidence: 99%

Egocentric Video Summarization Based on People Interaction Using Deep Learning

Ghafoor

Javed

Irtaza

et al. 2018

Mathematical Problems in Engineering

View full text Add to dashboard Cite

The availability of wearable cameras in the consumer market has motivated the users to record their daily life activities and post them on the social media. This exponential growth of egocentric videos demand to develop automated techniques to effectively summarizes the first-person video data. Egocentric videos are commonly used to record lifelogs these days due to the availability of low cost wearable cameras. However, egocentric videos are challenging to process due to the fact that placement of camera results in a video which presents great deal of variation in object appearance, illumination conditions, and movement. This paper presents an egocentric video summarization framework based on detecting important people in the video. The proposed method generates a compact summary of egocentric videos that contains information of the people whom the camera wearer interacts with. Our proposed approach focuses on identifying the interaction of camera wearer with important people. We have used AlexNet convolutional neural network to filter the key-frames (frames where camera wearer interacts closely with the people). We used five convolutional layers and two completely connected hidden layers and an output layer. Dropout regularization method is used to reduce the overfitting problem in completely connected layers. Performance of the proposed method is evaluated on UT Ego standard dataset. Experimental results signify the effectiveness of the proposed method in terms of summarizing the egocentric videos.

show abstract

Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects

Cited by 14 publications

References 22 publications

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos

Semantic Hyperlapse: a Sparse Coding-based and Multi-Importance Approach for First-Person Videos

Egocentric Video Summarization Based on People Interaction Using Deep Learning

Contact Info

Product

Resources

About