In this paper, we study the problem of learning Graph Convolutional Networks (GCNs) for regression. Current architectures of GCNs are limited to the small receptive field of convolution filters and shared transformation matrix for each node. To address these limitations, we propose Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data. SemGCN learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph. These semantic relationships can be learned through end-to-end training from the ground truth without additional supervision or hand-crafted rules. We further investigate applying SemGCN to 3D human pose regression. Our formulation is intuitive and sufficient since both 2D and 3D human poses can be represented as a structured graph encoding the relationships between joints in the skeleton of a human body. We carry out comprehensive studies to validate our method. The results prove that SemGCN outperforms state of the art while using 90% fewer parameters. The code can be found at https://github.com/garyzhao/SemGCN .
Understanding the collective dynamics of crowd movements during stressful emergency situations is central to reducing the risk of deadly crowd disasters. Yet, their systematic experimental study remains a challenging open problem due to ethical and methodological constraints. In this paper, we demonstrate the viability of shared three-dimensional virtual environments as an experimental platform for conducting crowd experiments with real people. In particular, we show that crowds of real human subjects moving and interacting in an immersive three-dimensional virtual environment exhibit typical patterns of real crowds as observed in real-life crowded situations. These include the manifestation of social conventions and the emergence of self-organized patterns during egress scenarios. High-stress evacuation experiments conducted in this virtual environment reveal movements characterized by mass herding and dangerous overcrowding as they occur in crowd disasters. We describe the behavioural mechanisms at play under such extreme conditions and identify critical zones where overcrowding may occur. Furthermore, we show that herding spontaneously emerges from a density effect without the need to assume an increase of the individual tendency to imitate peers. Our experiments reveal the promise of immersive virtual environments as an ethical, cost-efficient, yet accurate platform for exploring crowd behaviour in high-risk situations with real human subjects.
There has been a recent paradigm shift in the computer animation industry with an increasing use of prerecorded motion for animating virtual characters. A fundamental requirement to using motion capture data is an efficient method for indexing and retrieving motions. In this paper, we propose a flexible, efficient method for searching arbitrarily complex motions in large motion databases. Motions are encoded using keys which represent a wide array of structural, geometric and, dynamic features of human motion. Keys provide a representative search space for indexing motions and users can specify sequences of key values as well as multiple combination of key sequences to search for complex motions. We use a trie-based data structure to provide an efficient mapping from key sequences to motions. The search times (even on a single CPU) are very fast, opening the possibility of using large motion data sets in real-time applications.
We consider the problem of image-to-video translation, where an input image is translated into an output video containing motions of a single object. Recent methods for such problems typically train transformation networks to generate future frames conditioned on the structure sequence. Parallel work has shown that short high-quality motions can be generated by spatiotemporal generative networks that leverage temporal knowledge from the training data. We combine the benefits of both approaches and propose a two-stage generation framework where videos are generated from structures and then refined by temporal signals. To model motions more efficiently, we train networks to learn residual motion between the current and future frames, which avoids learning motion-irrelevant details. We conduct extensive experiments on two image-to-video translation tasks: facial expression retargeting and human pose forecasting. Superior results over the state-of-the-art methods on both tasks demonstrate the effectiveness of our approach. 3
In this paper we propose a general framework for local pathplanning and steering that can be easily extended to perform highlevel behaviors. Our framework is based on the concept of affordances -the possible ways an agent can interact with its environment. Each agent perceives the environment through a set of vector and scalar fields that are represented in the agent's local space. This egocentric property allows us to efficiently compute a local spacetime plan. We then use these perception fields to compute a fitness measure for every possible action, known as an affordance field. The action that has the optimal value in the affordance field is the agent's steering decision. Using our framework, we demonstrate autonomous virtual pedestrians that perform steering and path planning in unknown environments along with the emergence of highlevel responses to never seen before situations.
Steering is a challenging task, required by nearly all agents in virtual worlds. There is a large and growing number of approaches for steering, and it is becoming increasingly important to ask a fundamental question: how can we objectively compare steering algorithms? To our knowledge, there is no standard way of evaluating or comparing the quality of steering solutions. This paper presents SteerBench: a benchmark framework for objectively evaluating steering behaviors for virtual agents. We propose a diverse set of test cases, metrics of evaluation, and a scoring method that can be used to compare different steering algorithms. Our framework can be easily customized by a user to evaluate specific behaviors and new test cases. We demonstrate our benchmark process on two example steering algorithms, showing the insight gained from our metrics. We hope that this framework can grow into a standard for steering evaluation.
The majority of current systems for end-toend dialog generation focus on response quality without an explicit control over the affective content of the responses. In this paper, we present an affect-driven dialog system, which generates emotional responses in a controlled manner using a continuous representation of emotions. The system achieves this by modeling emotions at a word and sequence level using: (1) a vector representation of the desired emotion, (2) an affect regularizer, which penalizes neutral words, and (3) an affect sampling method, which forces the neural network to generate diverse words that are emotionally relevant. During inference, we use a reranking procedure that aims to extract the most emotionally relevant responses using a humanin-the-loop optimization process. We study the performance of our system in terms of both quantitative (BLEU score and response diversity), and qualitative (emotional appropriateness) measures.
Figure 1: Snapshots of a bottleneck and a doorway scenario, showing progress from left to right. The hybrid approach efficiently handles large crowds and challenging space-time scenarios. AbstractNext-generation steering algorithms will need to support thousands of believable individual agents, capable of steering in very challenging situations with low-latency reactions. In this paper we propose a steering framework that offers three key contributions: (a) It integrates several models of steering into a single steering decision, (b) it employs a novel space-time planning approach to allow agents to steer during complex local interactions, and (c) it varies the frequency of update of each component (phase) of the framework to drastically improve performance. We demonstrate the versatility and robustness of our framework using a large number of test cases. We also show that the frequency of updates for each phase of the framework can be "decimated" by a surprisingly large amount before resulting steering behaviors degrade. This technique achieves more than a 5× performance improvement, allowing the use of better, more costly algorithms for robust steering, while supporting thousands of agents with low-latency reactions in real-time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.