Original SAE SP-AEN Train Test Test SUN CUB Train Test Test (a) (b) Figure 1: (a) Attribute variance heat maps of the 312 attributes in CUB birds [60] and the 102 attributes in SUN scenes [47] (lighter color indicates lower variance, i.e., lower discriminability) and the t-SNE [35] visualizations of the test images represented by all attributes (left) and only the high-variance ones (right). Some of the low-variance attributes (the lighter part to the left of the cut-off line) discarded at training are still needed in discriminating unseen test classes. (b) Comparison of reconstructed images using SAE [25] and our proposed SP-AEN method, which is shown to retain sufficient semantics for photo-realistic reconstruction. AbstractWe propose a novel framework called Semantics-Preserving Adversarial Embedding Network (SP-AEN) for zero-shot visual recognition (ZSL), where test images and their classes are both unseen during training. SP-AEN aims to tackle the inherent problem -semantic lossin the prevailing family of embedding-based ZSL, where some semantics would be discarded during training if they are non-discriminative for training classes, but could become critical for recognizing test classes. Specifically, SP-AEN prevents the semantic loss by introducing an independent visual-to-semantic space embedder which disentangles the semantic space into two subspaces for the two arguably conflicting objectives: classification and reconstruction. Through adversarial learning of the two subspaces, SP-AEN can transfer the semantics from the reconstructive subspace to the discriminative one, accomplishing the improved zero-shot recognition of unseen classes. Comparing * Corresponding Author with prior works, SP-AEN can not only improve classification but also generate photo-realistic images, demonstrating the effectiveness of semantic preservation. On four popular benchmarks: CUB, AWA, SUN and aPY, SP-AEN considerably outperforms other state-of-the-art methods by an absolute performance difference of 12.2%, 9.3%, 4.0%, and 3.6% in terms of harmonic mean values [63].
Today's VQA models still tend to capture superficial linguistic correlations in the training set and fail to generalize to the test set with different QA distributions. To reduce these language biases, recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA model, and achieve dominating performance on diagnostic benchmarks for out-of-distribution testing. However, due to complex model design, these ensemble-based methods are unable to equip themselves with two indispensable characteristics of an ideal VQA model: 1) Visual-explainable: The model should rely on the right visual regions when making decisions. 2) Question-sensitive: The model should be sensitive to the linguistic variations in questions. To this end, we propose a novel modelagnostic Counterfactual Samples Synthesizing and Training (CSST) strategy. After training with CSST, VQA models are forced to focus on all critical objects and words, which significantly improves both visual-explainable and question-sensitive abilities. Specifically, CSST is composed of two parts: Counterfactual Samples Synthesizing (CSS) and Counterfactual Samples Training (CST). CSS generates counterfactual samples by carefully masking critical objects in images or words in questions and assigning pseudo ground-truth answers. CST not only trains the VQA models with both complementary samples to predict respective ground-truth answers, but also urges the VQA models to further distinguish the original samples and superficially similar counterfactual ones. To facilitate the CST training, we propose two variants of supervised contrastive loss for VQA, and design an effective positive and negative sample selection mechanism based on CSS. Extensive experiments have shown the effectiveness of CSST. Particularly, by building on top of model LMH+SAR [1],[2], we achieve record-breaking performance on all out-of-distribution benchmarks (e.g., VQA-CP v2, v1, and GQA-OOD).
Scene graphs -objects as nodes and visual relationships as edges -describe the whereabouts and interactions of objects in an image for comprehensive scene understanding. To generate coherent scene graphs, almost all existing methods exploit the fruitful visual context by modeling message passing among objects. For example, "person" on "bike" can help to determine the relationship "ride", which in turn contributes to the confidence of the two objects. However, we argue that the visual context is not properly learned by using the prevailing cross-entropy based supervised learning paradigm, which is not sensitive to graph inconsistency: errors at the hub or non-hub nodes should not be penalized equally. To this end, we propose a Counterfactual critic Multi-Agent Training (CMAT) approach. CMAT is a multi-agent policy gradient method that frames objects into cooperative agents, and then directly maximizes a graph-level metric as the reward. In particular, to assign the reward properly to each agent, CMAT uses a counterfactual baseline that disentangles the agent-specific reward by fixing the predictions of other agents. Extensive validations on the challenging Visual Genome benchmark show that CMAT achieves a state-of-the-art performance by significant gains under various settings and metrics.
A self-driven closed-loop parallel testing system implements more challenging tests to accelerate evaluation and development of autonomous vehicles.
Background: The traditional navigation interface was intended only for two-dimensional observation by doctors; thus, this interface does not display the total spatial information for the lesion area. Surgical navigation systems have become essential tools that enable for doctors to accurately and safely perform complex operations. The image navigation interface is separated from the operating area, and the doctor needs to switch the field of vision between the screen and the patient's lesion area. In this paper, augmented reality (AR) technology was applied to spinal surgery to provide more intuitive information to surgeons. The accuracy of virtual and real registration was improved via research on AR technology. During the operation, the doctor could observe the AR image and the true shape of the internal spine through the skin. Methods: To improve the accuracy of virtual and real registration, a virtual and real registration technique based on an improved identification method and robotassisted method was proposed. The experimental method was optimized by using the improved identification method. X-ray images were used to verify the effectiveness of the puncture performed by the robot. Results: The final experimental results show that the average accuracy of the virtual and real registration based on the general identification method was 9.73 ± 0.46 mm (range 8.90-10.23 mm). The average accuracy of the virtual and real registration based on the improved identification method was 3.54 ± 0.13 mm (range 3.36-3.73 mm). Compared with the virtual and real registration based on the general identification method, the accuracy was improved by approximately 65%. The highest accuracy of the virtual and real registration based on the robot-assisted method was 2.39 mm. The accuracy was improved by approximately 28.5% based on the improved identification method. Conclusion: The experimental results show that the two optimized methods are highly very effective. The proposed AR navigation system has high accuracy and stability. This system may have value in future spinal surgeries.
Cross-cultural researchers have questioned the extent to which EuropeanAmerican management practices can be transported to major markets in Asia, such as the People's Republic of China. Applying employee involvement theory, we examined the relationships between climate for autonomy, work demands climate, employee stress and organizational productivity in a crossnational study of 51 UK and 104 Chinese manufacturing organizations. We predicted and found that climate for autonomy was positively and negatively related to stress in the Chinese and UK contexts, respectively. The interaction of climate for autonomy and work demands climate was significant: climate for autonomy was positively related to organizational productivity only when work demands climate was low.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.