Jordi Sanchez-Riera scite author profile

A simple seed growing algorithm for estimating scene flow in a stereo setup is presented. Two calibrated and synchronized cameras observe a scene and output a sequence of image pairs. The algorithm simultaneously computes a disparity map between the image pairs and optical flow maps between consecutive images. This, together with calibration data, is an equivalent representation of the 3D scene flow, i.e. a 3D velocity vector is associated with each reconstructed point. The proposed method starts from correspondence seeds and propagates these correspondences to their neighborhood. It is accurate for complex scenes with large motions and produces temporallycoherent stereo disparity and optical flow results. The algorithm is fast due to inherent search space reduction. An explicit comparison with recent methods of spatiotemporal stereo and variational optical and scene flow is provided.

show abstract

Simultaneous pose, correspondence and non-rigid shape

Sanchez-Riera

Östlund

Fua

et al. 2010

View full text Add to dashboard Cite

Robust RGB-D Hand Tracking Using Deep Learning Priors

Sanchez-Riera

Srinivasan

Hua

et al. 2018

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

Active-speaker detection and localization with microphones and cameras embedded into a robotic head

Čech

Mittal

Deleforge

et al. 2013

View full text Add to dashboard Cite

Abstract-In this paper we present a method for detecting and localizing an active speaker, i.e., a speaker that emits a sound, through the fusion between visual reconstruction with a stereoscopic camera pair and sound-source localization with several microphones. Both the cameras and the microphones are embedded into the head of a humanoid robot. The proposed statistical fusion model associates 3D faces of potential speakers with 2D sound directions. The paper has two contributions: (i) a method that discretizes the two-dimensional space of all possible sound directions and that accumulates evidence for each direction by estimating the time difference of arrival (TDOA) over all the microphone pairs, such that all the microphones are used simultaneously and symmetrically and (ii) an audio-visual alignment method that maps 3D visual features onto 2D sound directions and onto TDOAs between microphone pairs. This allows to implicitly represent both sensing modalities into a common audiovisual coordinate frame. Using simulated as well as real data, we quantitatively assess the robustness of the method against noise and reverberations, and we compare it with several other methods. Finally, we describe a realtime implementation using the proposed technique and with a humanoid head embedding four microphones and two cameras: this enables natural human-robot interactive behavior.

show abstract

A comparative study of data fusion for RGB-D based visual recognition

Sanchez-Riera

Hua

Hsiao

et al. 2016

Pattern Recognition Letters

View full text Add to dashboard Cite

Action Recognition Robust to Background Clutter by Using Stereo Vision

Sanchez-Riera¹,

Čech²,

Horaud³

2012

View full text Add to dashboard Cite

Abstract. An action recognition algorithm which works with binocular videos is presented. The proposed method uses standard bag-of-words approach, where each action clip is represented as a histogram of visual words. However, instead of using classical monocular HoG/HoF features, we construct features from the scene-flow computed by a matching algorithm on the sequence of stereo images. The resulting algorithm has a comparable or slightly better recognition accuracy than standard monocular solution in controlled setup with a single actor present in the scene. However, we show its significantly improved performance in case of strong background clutter due to other people freely moving behind the actor.

show abstract

Online multimodal speaker detection for humanoid robots

Sanchez-Riera

Alameda-Pineda

Wienke

et al. 2012

View full text Add to dashboard Cite

Abstract-In this paper we address the problem of audiovisual speaker detection. We introduce an online system working on the humanoid robot NAO. The scene is perceived with two cameras and two microphones. A multimodal Gaussian mixture model (mGMM) fuses the information extracted from the auditory and visual sensors and detects the most probable audio-visual object, e.g., a person emitting a sound, in the 3D space. The system is implemented on top of a platformindependent middleware and it is able to process the information online (17Hz). A detailed description of the system and its implementation are provided, with special emphasis on the online processing issues and the proposed solutions. Experimental validation, performed with five different scenarios, show that that the proposed method opens the door to robust humanrobot interaction scenarios.

show abstract

Feature distribution modelling techniques for 3D face verification

McCool

Sanchez-Riera

Marcel

2010

Pattern Recognition Letters

View full text Add to dashboard Cite

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.