Tongzhou Mu scite author profile

Abstract-Automatic facial expression recognition (FER) is both interesting and important in computer vision and machine intelligence. While previous FER systems often focus on learning a classifier in a controlled environment, a more practical and robust scenario is now under consideration. More specifically, traditional FER requires collecting as many as possible facial photos so that they will accurately recognize expressions no matter a particular person wears sunglasses, hats, and other accessories or not. Such requirement is however inconvenient and could impose practical difficulties for users. To alleviate this problem, a robust one-shot FER system that only requires taking one single facial photo for each expression of each user is proposed in this paper. When taking the single photo, the user is free to choose whether to wear sunglasses or not. The sunglasses can even be of different shape and of various luminous transmittance.Such one-shot recognition improves the user-friendliness of the FER system. Importantly, a novel and practical sunglasses detection and recovery approach is developed, which obtains an obvious accuracy improvement of 6.09%, 5.86% and 4.33% with state-of-the-art classifiers including Support Vector Machine (SVM), Linear Discriminate Analysis (LDA) and K-Nearest Neighbors (KNN) respectively on the modified Japanese Female Facial Expression (JAFFE) benchmark database.Index Terms-Facial expression recognition, sunglasses detection and recovery, one-shot recognition system.

show abstract

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Shen

Qian

et al. 2017

View full text Add to dashboard Cite

Nowadays, algorithms with fast convergence, small memory footprints, and low per-iteration complexity are particularly favorable for artificial intelligence applications. In this paper, we propose a doubly stochastic algorithm with a novel accelerating multi-momentum technique to solve large scale empirical risk minimization problem for learning tasks. While enjoying a provably superior convergence rate, in each iteration, such algorithm only accesses a mini batch of samples and meanwhile updates a small block of variable coordinates, which substantially reduces the amount of memory reference when both the massive sample size and ultra-high dimensionality are involved. Specifically, to obtain an ε-accurate solution, our algorithm requires only O(log(1/ε)/sqrt(ε)) overall computation for the general convex case and O((n+sqrt{nκ})log(1/ε)) for the strongly convex case. Empirical studies on huge scale datasets are conducted to illustrate the efficiency of our method in practice.

show abstract

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

Zhang¹,

Chen²,

Xiang³

et al. 2022

Preprint

View full text Add to dashboard Cite

In this paper, we focus on the simulation of active stereovision depth sensors, which are popular in both academic and industry communities. Inspired by the underlying mechanism of the sensors, we designed a fully physics-grounded simulation pipeline, which includes material acquisition, ray tracing based infrared (IR) image rendering, IR noise simulation, and depth estimation. The pipeline is able to generate depth maps with material-dependent error patterns similar to a real depth sensor. We conduct extensive experiments to show that perception algorithms and reinforcement learning policies trained in our simulation platform could transfer well to real world test cases without any fine-tuning. Furthermore, due to the high degree of realism of this simulation, our depth sensor simulator can be used as a convenient testbed to evaluate the algorithm performance in the real world, which will largely reduce the human effort in developing robotic algorithms. The entire pipeline has been integrated into the SAPIEN simulator and is open-sourced to promote the research of vision and robotics communities.

show abstract

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Gu¹,

Xiang²,

Li³

et al. 2023

Preprint

View full text Add to dashboard Cite

Generalizable manipulation skills, which can be composed to tackle longhorizon and complex daily chores, are one of the cornerstones of Embodied AI. However, existing benchmarks, mostly composed of a suite of simulatable environments, are insufficient to push cutting-edge research works because they lack object-level topological and geometric variations, are not based on fully dynamic simulation, or are short of native support for multiple types of manipulation tasks. To this end, we present ManiSkill2, the next generation of the SAPIEN ManiSkill benchmark, to address critical pain points often encountered by researchers when using benchmarks for generalizable manipulation skills. ManiSkill2 includes 20 manipulation task families with 2000+ object models and 4M+ demonstration frames, which cover stationary/mobile-base, single/dualarm, and rigid/soft-body manipulation tasks with 2D/3D-input data simulated by fully dynamic engines. It defines a unified interface and evaluation protocol to support a wide range of algorithms (e.g., classic sense-plan-act, RL, IL), visual observations (point cloud, RGBD), and controllers (e.g., action type and parameterization). Moreover, it empowers fast visual input learning algorithms so that a CNN-based policy can collect samples at about 2000 FPS with 1 GPU and 16 processes on a regular workstation. It implements a render server infrastructure to allow sharing rendering resources across all environments, thereby significantly reducing memory usage. We open-source all codes of our benchmark (simulator, environments, and baselines) and host an online challenge open to interdisciplinary researchers.Figure 1: ManiSkill2 provides a unified, fast, and accessible system that encompasses well-curated manipulation tasks (e.g., stationary/mobile-base, single/dual-arm, rigid/soft-body).1 † and * indicate equal contribution (in alphabetical order). See Appendix H for contributions. 2 Project website: https://maniskill2.github.io/ 3 Codes:

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Tongzhou Mu

ManiSkill: Generalizable Manipulation Skill Benchmark with Large-Scale Demonstrations

Robust One-Shot Facial Expression Recognition with Sunglasses

Accelerated Doubly Stochastic Gradient Algorithm for Large-scale Empirical Risk Minimization

Close the Optical Sensing Domain Gap by Physics-Grounded Active Stereo Sensor Simulation

ManiSkill2: A Unified Benchmark for Generalizable Manipulation Skills

Contact Info

Product

Resources

About