This study aims to propose a three-dimensional convolutional neural network (3D CNN)-based one-stage model for real-time action detection in video of construction equipment (ADVICE). The 3D CNN-based single-stream feature extraction network and detection network are designed with the implementation of the 3D attention module and feature pyramid network developed in this study to improve performance. For model evaluation, 130 videos were collected from YouTube including videos of four types of construction equipment at various construction sites. Trained on 520 clips and tested on 260 clips, ADVICE achieved precision and recall of 82.1% and 83.1%, respectively, with an inference speed of 36.6 frames per second. The evaluation results indicate that the proposed method can implement the 3D CNN-based one-stage model for real-time action detection of construction equipment in videos of diverse, variable, and complex construction sites. The proposed method paved the way to improving safety, productivity, and environmental management of construction projects.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.