Long Chen scite author profile

Original SAE SP-AEN Train Test Test SUN CUB Train Test Test (a) (b) Figure 1: (a) Attribute variance heat maps of the 312 attributes in CUB birds [60] and the 102 attributes in SUN scenes [47] (lighter color indicates lower variance, i.e., lower discriminability) and the t-SNE [35] visualizations of the test images represented by all attributes (left) and only the high-variance ones (right). Some of the low-variance attributes (the lighter part to the left of the cut-off line) discarded at training are still needed in discriminating unseen test classes. (b) Comparison of reconstructed images using SAE [25] and our proposed SP-AEN method, which is shown to retain sufficient semantics for photo-realistic reconstruction. AbstractWe propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training. SP-AEN aims to tackle the inherent problem -semantic lossin the prevailing family of embedding-based ZSL, where some semantics would be discarded during training if they are non-discriminative for training classes, but could become critical for recognizing test classes. Specifically, SP-AEN prevents the semantic loss by introducing an independent visual-to-semantic space embedder which disentangles the semantic space into two subspaces for the two arguably conflicting objectives: classification and reconstruction. Through adversarial learning of the two subspaces, SP-AEN can transfer the semantics from the reconstructive subspace to the discriminative one, accomplishing the improved zero-shot recognition of unseen classes. Comparing * Corresponding Author with prior works, SP-AEN can not only improve classification but also generate photo-realistic images, demonstrating the effectiveness of semantic preservation. On four popular benchmarks: CUB, AWA, SUN and aPY, SP-AEN considerably outperforms other state-of-the-art methods by an absolute performance difference of 12.2%, 9.3%, 4.0%, and 3.6% in terms of harmonic mean values [63].

show abstract

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Chen

et al. 2020

View full text Add to dashboard Cite

Today's VQA models still tend to capture superficial linguistic correlations in the training set and fail to generalize to the test set with different QA distributions. To reduce these language biases, recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on diagnostic benchmarks for out-of-distribution testing. However, due to complex model design, these ensemble-based methods are unable to equip themselves with two indispensable characteristics of an ideal VQA model: 1) Visual-explainable: The model should rely on the right visual regions when making decisions. 2) Question-sensitive: The model should be sensitive to the linguistic variations in questions. To this end, we propose a novel modelagnostic Counterfactual Samples Synthesizing and Training (CSST) strategy. After training with CSST, VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST). CSS generates counterfactual samples by carefully masking critical objects in images or words in questions and assigning pseudo ground-truth answers. CST not only trains the VQA models with both complementary samples to predict respective ground-truth answers, but also urges the VQA models to further distinguish the original samples and superficially similar counterfactual ones. To facilitate the CST training, we propose two variants of supervised contrastive loss for VQA, and design an effective positive and negative sample selection mechanism based on CSS. Extensive experiments have shown the effectiveness of CSST. Particularly, by building on top of model LMH+SAR [1],[2], we achieve record-breaking performance on all out-of-distribution benchmarks (e.g., VQA-CP v2, v1, and GQA-OOD).

show abstract

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

et al. 2019

View full text Add to dashboard Cite

Scene graphs -objects as nodes and visual relationships as edges -describe the whereabouts and interactions of objects in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects. For example, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the confidence of the two objects. However, we argue that the visual context is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes should not be penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach. CMAT is a multi-agent policy gradient method that frames objects into cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the predictions of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art performance by significant gains under various settings and metrics.

show abstract

A Comparative Analysis of LiDAR SLAM-Based Indoor Navigation for Autonomous Vehicles

Zou

Sun

Chen

et al. 2022

IEEE Trans. Intell. Transport. Syst.

116

View full text Add to dashboard Cite

Parallel testing of vehicle intelligence via virtual-real interaction

et al. 2019

View full text Add to dashboard Cite

show abstract

Optimization of virtual and real registration technology based on augmented reality in a surgical navigation system

et al. 2020

View full text Add to dashboard Cite

Background: The traditional navigation interface was intended only for two-dimensional observation by doctors; thus, this interface does not display the total spatial information for the lesion area. Surgical navigation systems have become essential tools that enable for doctors to accurately and safely perform complex operations. The image navigation interface is separated from the operating area, and the doctor needs to switch the field of vision between the screen and the patient's lesion area. In this paper, augmented reality (AR) technology was applied to spinal surgery to provide more intuitive information to surgeons. The accuracy of virtual and real registration was improved via research on AR technology. During the operation, the doctor could observe the AR image and the true shape of the internal spine through the skin. Methods: To improve the accuracy of virtual and real registration, a virtual and real registration technique based on an improved identification method and robotassisted method was proposed. The experimental method was optimized by using the improved identification method. X-ray images were used to verify the effectiveness of the puncture performed by the robot. Results: The final experimental results show that the average accuracy of the virtual and real registration based on the general identification method was 9.73 ± 0.46 mm (range 8.90-10.23 mm). The average accuracy of the virtual and real registration based on the improved identification method was 3.54 ± 0.13 mm (range 3.36-3.73 mm). Compared with the virtual and real registration based on the general identification method, the accuracy was improved by approximately 65%. The highest accuracy of the virtual and real registration based on the robot-assisted method was 2.39 mm. The accuracy was improved by approximately 28.5% based on the improved identification method. Conclusion: The experimental results show that the two optimized methods are highly very effective. The proposed AR navigation system has high accuracy and stability. This system may have value in future spinal surgeries.

show abstract

Lightweight Single-Image Super-Resolution Network with Attentive Auxiliary Feature Learning

Wang

Zhao

et al. 2021

View full text Add to dashboard Cite

Cross-cultural variations in climate for autonomy, stress and organizational productivity relationships: A comparison of Chinese and UK manufacturing organizations

et al. 2008

View full text Add to dashboard Cite

Cross-cultural researchers have questioned the extent to which EuropeanAmerican management practices can be transported to major markets in Asia, such as the People's Republic of China. Applying employee involvement theory, we examined the relationships between climate for autonomy, work demands climate, employee stress and organizational productivity in a crossnational study of 51 UK and 104 Chinese manufacturing organizations. We predicted and found that climate for autonomy was positively and negatively related to stress in the Chinese and UK contexts, respectively. The interaction of climate for autonomy and work demands climate was significant: climate for autonomy was positively related to organizational productivity only when work demands climate was low.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Long Chen

Zero-Shot Visual Recognition Using Semantics-Preserving Adversarial Embedding Networks

Counterfactual Samples Synthesizing for Robust Visual Question Answering

Counterfactual Critic Multi-Agent Training for Scene Graph Generation

A Comparative Analysis of LiDAR SLAM-Based Indoor Navigation for Autonomous Vehicles

Parallel testing of vehicle intelligence via virtual-real interaction

Optimization of virtual and real registration technology based on augmented reality in a surgical navigation system

Lightweight Single-Image Super-Resolution Network with Attentive Auxiliary Feature Learning

Cross-cultural variations in climate for autonomy, stress and organizational productivity relationships: A comparison of Chinese and UK manufacturing organizations

Contact Info

Product

Resources

About