Tze Ho Elden Tse scite author profile

This paper is about speaker verification and horizontal localisation in the presence of conspicuous noise. Specifically, we are interested in enabling a mobile robot to robustly and accurately spot the presence of a target speaker and estimate his/her position in challenging acoustic scenarios. While several solutions to both tasks have been proposed in the literature, little attention has been devoted to the development of systems able to function in harsh noisy conditions. To address these shortcomings, in this work we follow a purely data-driven approach based on deep learning architectures which, by not requiring any knowledge either on the nature of the masking noise or on the structure and acoustics of the operation environment, it is able to reliably act in previously unexplored acoustic scenes. Our experimental evaluation, relying on data collected in real environments with a robotic platform, demonstrates that our framework is able to achieve high performance both in the verification and localisation tasks, despite the presence of copious noise.

show abstract

S$$^2$$Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning

Tse

Zhang

Kim

et al. 2022

View full text Add to dashboard Cite

TP-AE: Temporally Primed 6D Object Pose Tracking with Auto-Encoders

Zheng

Leonardis

Tse

et al. 2022

View full text Add to dashboard Cite

Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

Feng¹,

Gao²,

Ma³

et al. 2023

Preprint

View full text Add to dashboard Cite

Temporal modeling is crucial for multi-frame human pose estimation. Most existing methods directly employ optical flow or deformable convolution to predict fullspectrum motion fields, which might incur numerous irrelevant cues, such as a nearby person or background. Without further efforts to excavate meaningful motion priors, their results are suboptimal, especially in complicated spatiotemporal interactions. On the other hand, the temporal difference has the ability to encode representative motion information which can potentially be valuable for pose estimation but has not been fully exploited. In this paper, we present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts and engages mutual information objectively to facilitate useful motion information disentanglement. To be specific, we design a multi-stage Temporal Difference Encoder that performs incremental cascaded learning conditioned on multi-stage feature difference sequences to derive informative motion representation. We further propose a Representation Disentanglement module from the mutual information perspective, which can grasp discriminative task-relevant motion signals by explicitly defining useful and noisy constituents of the raw motion features and minimizing their mutual information. These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark dataset HiEve, and achieve state-of-the-art performance on three benchmarks PoseTrack2017, PoseTrack2018, and PoseTrack21.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tze Ho Elden Tse

Collaborative Learning for Hand and Object Reconstruction with Attention-guided Graph Convolution

No Need to Scream: Robust Sound-Based Speaker Localisation in Challenging Scenarios

S$$^2$$Contact: Graph-Based Network for 3D Hand-Object Contact Estimation with Semi-supervised Learning

TP-AE: Temporally Primed 6D Object Pose Tracking with Auto-Encoders

Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video

Contact Info

Product

Resources

About