Piotr Biliński scite author profile

We propose a novel end-to-end trainable, deep, encoderdecoder architecture for single-pass semantic segmentation. Our approach is based on a cascaded architecture with feature-level long-range skip connections. The encoder incorporates the structure of ResNeXt's residual building blocks and adopts the strategy of repeating a building block that aggregates a set of transformations with the same topology. The decoder features a novel architecture, consisting of blocks, that (i) capture context information, (ii) generate semantic features, and (iii) enable fusion between different output resolutions. Crucially, we introduce dense decoder shortcut connections to allow decoder blocks to use semantic feature maps from all previous decoder levels, i.e. from all higher-level feature maps. The dense decoder connections allow for effective information propagation from one decoder block to another, as well as for multi-level feature fusion that significantly improves the accuracy. Importantly, these connections allow our method to obtain state-of-the-art performance on several challenging datasets, without the need of time-consuming multi-scale averaging of previous works.

show abstract

Human violence recognition and detection in surveillance videos

Biliński

Brémond

2016

View full text Add to dashboard Cite

In this paper, we focus on the important topic of violence recognition and detection in surveillance videos. Our goal is to determine if a violence occurs in a video (recognition) and when it happens (detection). Firstly, we propose an extension of the Improved Fisher Vectors (IFV) for videos, which allows to represent a video using both local features and their spatio-temporal positions. Then, we study the popular sliding window approach for violence detection, and we re-formulate the Improved Fisher Vectors and use the summed area table data structure to speed up the approach. We present an extensive evaluation, comparison and analysis of the proposed improvements on 4 state-ofthe-art datasets. We show that the proposed improvements make the violence recognition more accurate (as compared to the standard IFV, IFV with spatio-temporal grid, and other state-of-the-art methods) and make the violence detection significantly faster.

show abstract

Multi3Net: Segmenting Flooded Buildings via Fusion of Multiresolution, Multisensor, and Multitemporal Satellite Imagery

Rudner

Rußwurm

Fil

et al. 2019

AAAI

View full text Add to dashboard Cite

We propose a novel approach for rapid segmentation of flooded buildings by fusing multiresolution, multisensor, and multitemporal satellite imagery in a convolutional neural network. Our model significantly expedites the generation of satellite imagery-based flood maps, crucial for first responders and local authorities in the early stages of flood events. By incorporating multitemporal satellite imagery, our model allows for rapid and accurate post-disaster damage assessment and can be used by governments to better coordinate medium-and long-term financial assistance programs for affected areas. The network consists of multiple streams of encoder-decoder architectures that extract spatiotemporal information from medium-resolution images and spatial information from high-resolution images before fusing the resulting representations into a single medium-resolution segmentation map of flooded buildings. We compare our model to state-of-the-art methods for building footprint segmentation as well as to alternative fusion approaches for the segmentation of flooded buildings and find that our model performs best on both tasks. We also demonstrate that our model produces highly accurate segmentation maps of flooded buildings using only publicly available medium-resolution data instead of significantly more detailed but sparsely available very high-resolution data. We release the first open-source dataset of fully preprocessed and labeled multiresolution, multispectral, and multitemporal satellite images of disaster sites along with our source code.

show abstract

Show me your face and I will tell you your height, weight and body mass index

Dantcheva

Brémond

Biliński

2018

View full text Add to dashboard Cite

Body height, weight, as well as the associated and composite body mass index (BMI) are human attributes of pertinence due to their use in a number of applications including surveillance, re-identification, image retrieval systems, as well as healthcare. Previous work on automated estimation of height, weight and BMI has predominantly focused on 2D and 3D fullbody images and videos. Little attention has been given to the use of face for estimating such traits. Motivated by the above, we here explore the possibility of estimating height, weight and BMI from single-shot facial images by proposing a regression method based on the 50-layers ResNet-architecture. In addition, we present a novel dataset consisting of 1026 subjects and show results, which suggest that facial images contain discriminatory information pertaining to height, weight and BMI, comparable to that of body-images and videos. Finally, we perform a genderbased analysis of the prediction of height, weight and BMI.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Piotr Biliński

G3AN: Disentangling Appearance and Motion for Video Generation

Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation

Human violence recognition and detection in surveillance videos

Multi3Net: Segmenting Flooded Buildings via Fusion of Multiresolution, Multisensor, and Multitemporal Satellite Imagery

Show me your face and I will tell you your height, weight and body mass index

Contact Info

Product

Resources

About