Transitional actions belong to a class between actions for short-term action prediction (see Figure 1). Early action recognition is necessary for producing action predictions in the early frames of an objective action. Earlier prediction in the initial frames of an objective action is desirable for early action recognition problems, but the solutions depend on the action itself. On one hand, within the setting of a shortterm action prediction, understanding a pending human action change is more natural if we have a firm grasp on transitional actions. In a traffic scene, short-term action predictions are particularly crucial for avoiding accidents between humans and vehicles. Figure 1 shows sequential actions that include Walk straight, Walk straight -cross, and cross. Where Walk straight and cross are conventional action definitions, our proposal adds a transitional action between actions (here Walk straight -cross) in order to provide a better action approach to predictions. Our proposed short-term predictions achieve earlier prediction than so-called early activity recognition, since they can recognize a dangerous cross action while it is transitional. Intuitively, the recognition difficulty arising from action and transitional action is that they tend to partially overlap each other. We believe that the use of a subtle motion descriptor (SMD) will allow us to identify sensitive differences between actions and transitional actions.In this paper, we address the recognition of transitional action for short-term action prediction. We also propose a discriminative temporal convolutional neural network (CNN) feature that can be used for recognizing transitional actions in order to overcome the difficulty of indistinguishable feature classification in transitional actions. To accomplish this, we employ an SMD that captures subtle differences between consecutive frames. Our paper contains two main contributions: (i) the definition of transitional action for short-term action prediction that achieves earlier prediction than early action recognition, and (ii) identifying CNN-based SMD to create a clear distinctions between action and transitional action. The feature is simply updated from a spatio-temporal CNN feature Pooled Time Series (PoT) proposed in [1].Our CNN-based SMD demonstrated the best rate of success on three different trial datasets. Even when using the shortest (3-frame) feature accumulation for recognition tuning, we confirmed outstanding results with 85.78% (NTSEL), 69.77% (UTKinect), and 49.93% (Watch-n-Patch) on the three different datasets.[1] M. S. Ryoo, B. Rothrock, and L. Matthies. Pooled motion features for first-person videos. CVPR, 2015.
SynopsisA V-shaped segregation, found in a sulfur print of longitudinal strand section, has been macroscopically investigated. A simple model, using polyethylene particles of 2 mm diameter, has been prepared to simulate formation of the V-shaped segregation.The results obtained are summarized as follows :(1) The V-shaped segregation is observed only in an equiaxed crystal zone very periodically along the casting direction.(2) The frequency of the V-shaped segregation becomes smaller, while its density becomes larger with increase in the thickness of the equiaxed zone.(3) The mechanism of formation of the V-shaped segregation may be explained as follows; The enriched liquid between the equiaxed crystals is sucked and flows downwards and is accumulated along the planes which are made by the forcible movement of the equiaxed crystals piling in the end of the solidification regions toward the strand center.(4) The periodicity of the V-shaped segregation is quantitatively explained by the rheological approach assuming the equiaxed crystals piling in the end of the solidification regions as cohesive particles.
Change captioning tasks aim to detect changes in image pairs observed before and after a scene change and generate a natural language description of the changes. Existing change captioning studies have mainly focused on scenes with a single change. However, detecting and describing multiple changed parts in image pairs is essential for enhancing adaptability to complex scenarios. We solve the above issues from three aspects: (i) We propose a CG-based multi-change captioning dataset; (ii) We benchmark existing state-of-the-art methods of single change captioning on multi-change captioning; (iii) We further propose Multi-Change Captioning transformers (MCCFormers) that identify change regions by densely correlating different regions in image pairs and dynamically determines the related change regions with words in sentences. The proposed method obtained the highest scores on four conventional change captioning evaluation metrics for multichange captioning. In addition, existing methods generate a single attention map for multiple changes and lack the ability to distinguish change regions. In contrast, our proposed method can separate attention maps for each change and performs well with respect to change localization. Moreover, the proposed framework outperformed the previous state-of-the-art methods on an existing change captioning benchmark, CLEVR-Change, by a large margin (+6.1 on BLEU-4 and +9.7 on CIDEr scores), indicating its general ability in change captioning tasks. Our code and dataset will be publicly available through the project page 1 . Before Change captions After Caption 1: The large gray rubber sphere has disappeared. (delete) Caption 2: There is no longer a large cyan metal cube. (delete) Caption 3: The large brown metal sphere was moved from its original location. (move) Caption 4: The small yellow rubber cylinder was replaced by a small red rubber sphere. (replace)
In this study, a spatial-dependent background model for detecting objects is used in severe imaging conditions. It is robust in the cases of sudden illumination fluctuation and burst motion background. More importantly, it is quite sensitive under the cases of underexposure, low-illumination, and narrow dynamic range, all of which are very common phenomenon using a surveillance camera. The background model maintains statistical models in the form of multiple pixel pairs with few parameters. Experiments using several challenging datasets (Heavy Fog, PETS-2001, AIST-INDOOR, and a real surveillance application) confirm the robust performance in various imaging conditions.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.