PurposeThe paper proposes a privacy-preserving artificial intelligence-enabled video surveillance technology to monitor social distancing in public spaces.Design/methodology/approachThe paper proposes a new Responsible Artificial Intelligence Implementation Framework to guide the proposed solution's design and development. It defines responsible artificial intelligence criteria that the solution needs to meet and provides checklists to enforce the criteria throughout the process. To preserve data privacy, the proposed system incorporates a federated learning approach to allow computation performed on edge devices to limit sensitive and identifiable data movement and eliminate the dependency of cloud computing at a central server.FindingsThe proposed system is evaluated through a case study of monitoring social distancing at an airport. The results discuss how the system can fully address the case study's requirements in terms of its reliability, its usefulness when deployed to the airport's cameras, and its compliance with responsible artificial intelligence.Originality/valueThe paper makes three contributions. First, it proposes a real-time social distancing breach detection system on edge that extends from a combination of cutting-edge people detection and tracking algorithms to achieve robust performance. Second, it proposes a design approach to develop responsible artificial intelligence in video surveillance contexts. Third, it presents results and discussion from a comprehensive evaluation in the context of a case study at an airport to demonstrate the proposed system's robust performance and practical usefulness.
Deep learning-based person re-identification faces a scalability challenge when the target domain requires continuous learning. Service environments, such as airports, need to recognize new visitors and add new cameras over time. Training-at-once is not enough to make the model robust to new tasks and domain variations. A well-known approach is fine-tuning, which suffers forgetting problem on old tasks when learning new tasks. Joint-training can alleviate the problem but requires old datasets, which is unobtainable in some cases. Recently, Learning without forgetting (LwF) shows its ability to mitigate the problem without old datasets. This paper extends the benefit of LwF from image classification to person re-identification with further challenges. Comprehensive experiments are based on Market1501 and DukeMTMC4ReID to evaluate and benchmark LwF to other approaches. The results confirm that LwF outperforms fine-tuning in preserving old knowledge and joint-training in faster training.
For robotics and AI applications, automatic facial expression recognition can be used to measure user's satisfaction on products and services that are provided through the human-computer interactions. Large-scale datasets are essentially required to construct a robust deep learning model, which leads to increased training computation cost and duration. This requirement is of particular issue when the training is supposed to be performed on an ongoing basis in devices with limited computation capacity, such as humanoid robots. Knowledge transfer has become a commonly used technique to adapt existing models and speed-up training process by supporting refinements on the existing parameters and weights for the target task. However, most state-of-the-art facial expression recognition models are still based on a single stage training (train at once), which would not be enough for achieving a satisfactory performance in real world scenarios. This paper proposes a knowledge transfer method to support learning using cross-domain datasets, from generic to specific domain. The experimental results demonstrate that shorter and incremental training using smaller-gap-domain from crossdomain datasets can achieve a comparable performance to training using a single large dataset from the target domain.
Measuring customer satisfaction based on facial expressions from video surveillance can potentially support real-time analysis. We propose the use of deep residual network (ResNet), which has been a widely used for many image recognition tasks, but not in the context of recognizing facial expressions in video surveillance. A key challenge in collecting video surveillance data in an airport context is to achieve a balanced distribution of all emotions, as most of passengers' faces are either neutral or happy. To solve this issue, there is no existing work that has established the feasibility of using datasets from different domains to train the model. This paper is the first in investigating the benefits of using residual training approach and adopt a pre-trained network from similar tasks to reduce training time. Based on comprehensive experiments, which compare domain-specific, crossdomain and mixed domain training and testing approaches, we confirm the value of augmenting datasets from different domains (CK+, JAFFE, AffectNet) for the surveillance domain.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.