Susanna Ricco scite author profile

This paper introduces a video dataset of spatiotemporally localized Atomic Visual Actions (AVA). The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions;(2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations. This departs from existing datasets for spatio-temporal action recognition, which typically provide sparse annotations for composite actions in short video clips.AVA, with its realistic scene and action complexity, exposes the intrinsic difficulty of action recognition. To benchmark this, we present a novel approach for action localization that builds upon the current state-of-the-art methods, and demonstrates better performance on JHMDB and UCF101-24 categories. While setting a new state of the art on existing datasets, the overall results on AVA are low at 15.6% mAP, underscoring the need for developing new approaches for video understanding.

show abstract

Correcting Motion Artifacts in Retinal Spectral Domain Optical Coherence Tomography via Image Registration

Ricco

Chen

Ishikawa

et al. 2009

View full text Add to dashboard Cite

Abstract. Spectral domain optical coherence tomography (SD-OCT)is an important tool for the diagnosis of various retinal diseases. The measurements available from SD-OCT volumes can be used to detect structural changes in glaucoma patients before the resulting vision loss becomes noticeable. Eye movement during the imaging process corrupts the data, making measurements unreliable. We propose a method to correct for transverse motion artifacts in SD-OCT volumes after scan acquisition by registering the volume to an instantaneous, and therefore artifact-free, reference image. Our procedure corrects for smooth deformations resulting from ocular tremor and drift as well as the abrupt discontinuities in vessels resulting from microsaccades. We test our performance on 48 scans of healthy eyes and 116 scans of glaucomatous eyes, improving scan quality in 96% of healthy and 73% of glaucomatous eyes.

show abstract

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Gu¹,

Sun²,

Ross³

et al. 2017

Preprint

View full text Add to dashboard Cite

Dense Lagrangian motion estimation with occlusions

Ricco

Tomasi

2012

View full text Add to dashboard Cite

We couple occlusion modeling and multi-frame motion estimation to compute dense, temporally extended point trajectories in video with significant occlusions. Our approach combines robust spatial regularization with spatially and temporally global occlusion labeling in a variational, Lagrangian framework with subspace constraints. We track points even through ephemeral occlusions. Experiments demonstrate accuracy superior to the state of the art while tracking more points through more frames.

show abstract

Fingerspelling Recognition through Classification of Letter-to-Letter Transitions

Ricco

Tomasi

2010

View full text Add to dashboard Cite

Abstract. We propose a new principle for recognizing fingerspelling sequences from American Sign Language (ASL). Instead of training a system to recognize the static posture for each letter from an isolated frame, we recognize the dynamic gestures corresponding to transitions between letters. This eliminates the need for an explicit temporal segmentation step, which we show is error-prone at speeds used by native signers. We present results from our system recognizing 82 different words signed by a single signer, using more than an hour of training and test video. We demonstrate that recognizing letter-to-letter transitions without temporal segmentation is feasible and results in improved performance.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Susanna Ricco

AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions

Correcting Motion Artifacts in Retinal Spectral Domain Optical Coherence Tomography via Image Registration

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

Dense Lagrangian motion estimation with occlusions

Fingerspelling Recognition through Classification of Letter-to-Letter Transitions

Contact Info

Product

Resources

About