Asset monitoring in construction sites is an intricate, manually intensive task, that can highly benefit from automated solutions engineered using deep neural networks. We use Single-Shot Multibox Detector -SSD, for its fine balance between speed and accuracy, to leverage ubiquitously available images and videos from the surveillance cameras on the construction sites and automate the monitoring tasks, hence enabling project managers to better track the performance and optimize the utilization of each resource. We propose to improve the performance of SSD by clustering the predicted boxes instead of a greedy approach like non-maximum suppression. We do so using Affinity Propagation Clustering -APC to cluster the predicted boxes based on the similarity index computed using the spatial features as well as location of predicted boxes. In our attempts, we have been able to improve the mean average precision of SSD by 3.77% on custom dataset consist of images from construction sites and by 1.67% on PASCAL VOC Challenge.
Fashion landmarks are the functional key-points on the apparels that can be used for a more discriminative visual analysis of the apparel images. Such a framework can facilitate apparel alignment in displaying apparel images on the websites or help build a system to ensure dress code in a particular environment. However, challenges such as background clutter, human poses, scales and apparel variation can render such a task difficult. We present a conceptually simple, flexible, and general framework for apparels' landmark detection that can be simultaneously used for apparel classification and localization. In addition to the position of the landmarks in the apparels, we also classify the landmarks as visible or occluded in the same framework. The fashion landmark detection task is similar to joint localization and detection problems like human pose estimation, hence our approach extends stacked hourglass architecture, originally proposed to solve human pose estimation. We perform all these tasks in parallel using multi-task learning. Our proposed convolutional neural network is end-to-end differentiable and simple to train, since all these tasks are performed on the same architecture without any additional parameters to learn. Over the past few years, many modifications have been proposed to improve this architecture. We also compare the performances of some of these different variations of stacked hourglass architectures. These architectures leverage both global and local features captured by the deep convolutional neural networks to better localize the apparel in the image as well as the landmarks in those apparels. We test and analyze our results on DeepFashion dataset. We also weigh the trade-offs of the detecting the landmarks in a category-aware environment, i.e., pre-classified apparels and category-agnostic environ-
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.