The performance of deep neural networks is strongly influenced by the quantity and quality of annotated data. Most of the large activity recognition datasets consist of data sourced from the web, which does not reflect challenges that exist in activities of daily living. In this paper, we introduce a large real-world video dataset for activities of daily living: Toyota Smarthome. The dataset consists of 16K RGB+D clips of 31 activity classes, performed by seniors in a smarthome. Unlike previous datasets, videos were fully unscripted. As a result, the dataset poses several challenges: high intra-class variation, high class imbalance, simple and composite activities, and activities with similar motion and variable duration. Activities were annotated with both coarse and fine-grained labels. These characteristics differentiate Toyota Smarthome from other datasets for activity recognition. As recent activity recognition approaches fail to address the challenges posed by Toyota Smarthome, we present a novel activity recognition method with attention mechanism. We propose a pose driven spatiotemporal attention mechanism through 3D ConvNets. We show that our novel method outperforms state-of-the-art methods on benchmark datasets, as well as on the Toyota Smarthome dataset. We release the dataset for research use 1 .
Abstract-Wireless multimedia sensor networks (WMSNs) are interconnected devices that allow retrieving video and audio streams, still images, and scalar data from the environment. In a densely deployed WMSN, there exists correlation among the visual information observed by cameras with overlapped field of views. This paper proposes a novel spatial correlation model for visual information in WMSNs. By studying the sensing model and deployments of cameras, a spatial correlation function is derived to describe the correlation characteristics of visual information observed by cameras with overlapped field of views. The joint effect of multiple correlated cameras is also studied. An entropy-based analytical framework is developed to measure the amount of visual information provided by multiple cameras in the network. Furthermore, according to the proposed correlation function and entropy-based framework, a correlation-based camera selection algorithm is designed. Experimental results show that the proposed spatial correlation function can model the correlation characteristics of visual information in WMSNs through low computation and communication costs. Further simulations show that, given a distortion bound at the sink, the correlation-based camera selection algorithm requires fewer cameras to report to the sink than the random selection algorithm.
Microneedle arrays show many advantages in drug delivery applications due to their convenience and reduced risk of infection. Compared to other microscale manufacturing methods, 3D printing easily overcomes challenges in the fabrication of microneedles with complex geometric shapes and multifunctional performance. However, due to material characteristics and limitations on printing capability, there are still bottlenecks to overcome for 3D printed microneedles to achieve the mechanical performance needed for various clinical applications. The hierarchical structures in limpet teeth, which are extraordinarily strong, result from aligned fibers of mineralized tissue and protein‐based polymer reinforced frameworks. These structures provide design inspiration for mechanically reinforced biomedical microneedles. Here, a bioinspired microneedle array is fabricated using magnetic field‐assisted 3D printing (MF‐3DP). Micro‐bundles of aligned iron oxide nanoparticles (aIOs) are encapsulated by polymer matrix during the printing process. A bioinspired 3D‐printed painless microneedle array is fabricated, and suitability of this microneedle patch for drug delivery during long‐term wear is demonstrated. The results reported here provide insights into how the geometrical morphology of microneedles can be optimized for the painless drug delivery in clinical trials.
The aim is to develop a rapid and direct method for measuring the bulk viscosity of a liquid as a function of temperature. Brillouin scattering of a laser beam in fresh water and salt water at different temperatures has been studied. The results show that there exists a close temperature-dependent relationship among the Brillouin frequency shift, the Brillouin linewidth, and the bulk viscosity of water. Thus the bulk viscosity of water can be determined directly from Brillouin-scattering measurements. The method has a high signal-to-noise ratio and high accuracy.
Many attempts have been made towards combining RGB and 3D poses for the recognition of Activities of Daily Living (ADL). ADL may look very similar and often necessitate to model fine-grained details to distinguish them. Because the recent 3D ConvNets are too rigid to capture the subtle visual patterns across an action, this research direction is dominated by methods combining RGB and 3D Poses. But the cost of computing 3D poses from RGB stream is high in the absence of appropriate sensors. This limits the usage of aforementioned approaches in real-world applications requiring low latency. Then, how to best take advantage of 3D Poses for recognizing ADL? To this end, we propose an extension of a pose driven attention mechanism: Video-Pose Network (VPN), exploring two distinct directions. One is to transfer the Pose knowledge into RGB through a feature-level distillation and the other towards mimicking pose driven attention through an attention-level distillation. Finally, these two approaches are integrated into a single model, we call VPN++. We show that VPN++ is not only effective but also provides a high speed up and high resilience to noisy Poses. VPN++, with or without 3D Poses, outperforms the representative baselines on 4 public datasets. Code is available at https://github.com/srijandas07/vpnplusplus.
Handling long and complex temporal information is an important challenge for action detection tasks. This challenge is further aggravated by densely distributed actions in untrimmed videos. Previous action detection methods fail in selecting the key temporal information in long videos. To this end, we introduce the Dilated Attention Layer (DAL). Compared to previous temporal convolution layer, DAL allocates attentional weights to local frames in the kernel, which enables it to learn better local representation across time. Furthermore, we introduce Pyramid Dilated Attention Network (PDAN) which is built upon DAL. With the help of multiple DALs with different dilation rates, PDAN can model short-term and long-term temporal relations simultaneously by focusing on local segments at the level of low and high temporal receptive fields. This property enables PDAN to handle complex temporal relations between different action instances in long untrimmed videos. To corroborate the effectiveness and robustness of our method, we evaluate it on three densely annotated, multi-label datasets: Mul-tiTHUMOS, Charades and Toyota Smarthome Untrimmed (TSU) dataset. PDAN is able to outperform previous stateof-the-art methods on all these datasets."Time abides long enough for those who make use of it." Leonardo da Vinci
Abstract-Data redundancy caused by correlation has motivated the application of collaborative multimedia in-network processing for data filtering and compression in wireless multimedia sensor networks (WMSNs). This paper proposes an information theoretic data compression framework with an objective to maximize the overall compression of the visual information gathered in a WMSN. To achieve this, an entropy-based divergence measure (EDM) scheme is proposed to predict the compression efficiency of performing joint coding on the images collected by spatially correlated cameras. The novelty of EDM relies on its independence of the specific image types and coding algorithms, thereby providing a generic mechanism for prior evaluation of compression under different coding solutions. Utilizing the predicted results from EDM, a distributed multi-cluster coding protocol (DMCP) is proposed to construct a compression-oriented coding hierarchy. The DMCP aims to partition the entire network into a set of coding clusters such that the global coding gain is maximized. Moreover, in order to enhance decoding reliability at data sink, the DMCP also guarantees that each sensor camera is covered by at least two different coding clusters. Experiments on H.264 standards show that the proposed EDM can effectively predict the joint coding efficiency from multiple sources. Further simulations demonstrate that the proposed compression framework can reduce 10% -23% total coding rate compared with the individual coding scheme, i.e., each camera sensor compresses its own image independently.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.