This article presents REVAMP 2 T, Real-time Edge Video Analytics for Multi-camera Privacy-aware Pedestrian Tracking, as an integrated end-to-end IoT system for privacybuilt-in decentralized situational awareness. REVAMP 2 T presents novel algorithmic and system constructs to push deep learning and video analytics next to IoT devices (i.e. video cameras). On the algorithm side, REVAMP 2 T proposes a unified integrated computer vision pipeline for detection, re-identification, and tracking across multiple cameras without the need for storing the streaming data. At the same time, it avoids facial recognition, and tracks and re-identifies pedestrians based on their key features at runtime. On the IoT system side, REVAMP 2 T provides infrastructure to maximize hardware utilization on the edge, orchestrates global communications, and provides system-wide re-identification, without the use of personally identifiable information, for a distributed IoT network. For the results and evaluation, this article also proposes a new metric, Accuracy • Efficiency (AE), for holistic evaluation of IoT systems for real-time video analytics based on accuracy, performance, and power efficiency. REVAMP 2 T outperforms current state-of-the-art by as much as thirteen-fold AE improvement.
Mez is a novel publish-subscribe messaging system for latency sensitive multi-camera machine vision applications at the IoT Edge. The unlicensed wireless communication in IoT Edge systems are characterized by large latency variations due to intermittent channel interference. To achieve user specified latency in the presence of wireless channel interference, Mez takes advantage of the ability of machine vision applications to temporarily tolerate lower quality video frames if overall application accuracy is not too adversely affected. Control knobs that involve lossy image transformation techniques that modify the frame size, and thereby the video frame transfer latency, are identified. Mez implements a network latency feedback controller that adapts to channel conditions by dynamically adjusting the video frame quality using the image transformation control knobs, so as to simultaneously satisfy latency and application accuracy requirements. Additionally, Mez uses an application domain specific design of the storage layer to provide low latency operations. Experimental evaluation on an IoT Edge testbed with a pedestrian detection machine vision application indicates that Mez is able to tolerate latency variations of up to 10x with a worst-case reduction of 4.2% of the application accuracy F1 score metric. The performance of Mez is also experimentally evaluated against state-of-the-art low latency NATS messaging system.
Transformer architectures have become the model of choice in natural language processing and are now being introduced into computer vision tasks such as image classification, object detection, and semantic segmentation. However, in the field of human pose estimation, convolutional architectures still remain dominant. In this work, we present PoseFormer, a purely transformer-based approach for 3D human pose estimation in videos without convolutional architectures involved. Inspired by recent developments in vision transformers, we design a spatial-temporal transformer structure to comprehensively model the human joint relations within each frame as well as the temporal correlations across frames, then output an accurate 3D human pose of the center frame. We quantitatively and qualitatively evaluate our method on two popular and standard benchmark datasets: Human3.6M and MPI-INF-3DHP. Extensive experiments show that PoseFormer achieves state-ofthe-art performance on both datasets. Code is available at https://github.com/zczcwh/PoseFormer
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.