This paper investigates how to fuse grayscale and thermal video data for detecting foreground objects under challenging scenarios. To this end, we propose an intuitive yet effective method, called WEighted Low-rank Decomposition (WELD), which adaptively pursues the cross-modality low-rank representation. Specifically, we form two data matrices by accumulating sequential frames from the grayscale and the thermal videos, respectively. Within these two observing matrices, WELD detects moving foreground pixels as sparse outliers against the low-rank structure background, and incorporates the weight variables to make the models of two modalities complementary to each other. The smoothness constraints of object motion are also introduced in WELD for further improving the robustness to noises. For optimization, we propose an iterative algorithm to efficiently solve the low-rank models with three sub-problems. Moreover, we utilize an edge-preserving filtering based method to substantially speed up WELD while preserving its accuracy. To provide a comprehensive evaluation benchmark of grayscalethermal foreground detection, we create a new dataset including 25 aligned grayscale-thermal video pairs with high diversity. Our extensive experiments on both the newly created dataset and the public dataset OSU3 suggest that WELD achieves superior performance and comparable efficiency against other state-ofthe-art approaches.
Abstract-Many research activities on Wireless Sensor Networks (WSNs) need detailed performance statistics about protocols, systems, and applications; however, current simulation tools and testbeds lack mechanisms to report these statistics realistically and conveniently. To address this need, we have developed a WSN emulator, VMNet. VMNet emulates networked sensor nodes at the level of CPU clock cycles and executes the binary code of real applications directly. It emulates the radio channel with loss and noise as well as emulates the peripherals in sufficient detail. Moreover, VMNet takes parameter values from the real world and logs detailed runtime information of emulated nodes. Consequently, the application performance, both in response time and in power consumption, is reported realistically in VMNet, as demonstrated by our comparison studies with real sensor networks.
Recognizing objects from simultaneously sensed photometric (RGB) and depth channels is a fundamental yet practical problem in many machine vision applications such as robot grasping and autonomous driving. In this paper, we address this problem by developing a Cross-Modal Attentional Context (CMAC) learning framework, which enables the full exploitation of the context information from both RGB and depth data. Compared to existing RGB-D object detection frameworks, our approach has several appealing properties. First, it consists of an attention-based global context model for exploiting adaptive contextual information and incorporating this information into a region-based CNN (e.g., Fast RCNN) framework to achieve improved object detection performance. Second, our CMAC framework further contains a fine-grained object part attention module to harness multiple discriminative object parts inside each possible object region for superior local feature representation. While greatly improving the accuracy of RGB-D object detection, the effective cross-modal information fusion as well as attentional context modeling in our proposed model provide an interpretable visualization scheme. Experimental results demonstrate that the proposed method significantly improves upon the state of the art on all public benchmarks.
scite is a Brooklyn-based startup that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.