John Collomosse scite author profile

We present an image retrieval system for the interactive search of photo collections using free-hand sketches depicting shape. We describe Gradient Field HOG (GF-HOG); an adapted form of the HOG descriptor suitable for sketch based image retrieval (SBIR). We incorporate GF-HOG into a Bag of Visual Words (BoVW) retrieval framework, and demonstrate how this combination may be harnessed both for robust SBIR, and for localizing sketched objects within an image. We evaluate over a large Flickr sourced dataset comprising 33 shape categories, using queries from 10 non-expert sketchers. We compare GF-HOG against state-of-the-art descriptors with common distance measures and language models for image retrieval, and explore how affine deformation of the sketch impacts search performance. GF-HOG is shown to consistently outperform retrieval versus SIFT, multi-resolution HOG, Self Similarity, Shape Context and Structure Tensor. Further, we incorporate semantic keywords in to our GF-HOG system to enable the use of annotated sketches for image search. A novel graph-based measure of semantic similarity is proposed and two applications explored: semantic sketch based image retrieval and a semantic photo montage.

show abstract

Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors

Trumble¹,

Gilbert²,

Malleson³

et al. 2017

190

195

View full text Add to dashboard Cite

We present an algorithm for fusing multi-viewpoint video (MVV) with inertial measurement unit (IMU) sensor data to accurately estimate 3D human pose. A 3-D convolutional neural network is used to learn a pose embedding from volumetric probabilistic visual hull data (PVH) derived from the MVV frames. We incorporate this model within a dual stream network integrating pose embeddings derived from MVV and a forward kinematic solve of the IMU data. A temporal model (LSTM) is incorporated within both streams prior to their fusion. Hybrid pose inference using these two complementary data sources is shown to resolve ambiguities within each sensor modality, yielding improved accuracy over prior methods. A further contribution of this work is a new hybrid MVV dataset (TotalCapture) comprising video, IMU and a skeletal joint ground truth derived from a commercial motion capture system.

show abstract

State of the "Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video

Kyprianidis

Collomosse

Wang

et al. 2013

IEEE Trans. Visual. Comput. Graphics

236

140

View full text Add to dashboard Cite

Gradient field descriptor for sketch based retrieval and localization

2010

View full text Add to dashboard Cite

We present an image retrieval system driven by free-hand sketched queries depicting shape. We introduce Gradient Field HoG (GF-HOG) as a depiction invariant image descriptor, encapsulating local spatial structure in the sketch and facilitating efficient codebook based retrieval. We show improved retrieval accuracy over 3 leading descriptors (Self Similarity, SIFT, HoG) across two datasets (Flickr160, ETHZ extended objects), and explain how GF-HOG can be combined with RANSAC to localize sketched objects within relevant images. We also demonstrate a prototype sketch driven photo montage application based on our system.

show abstract

Stroke Surfaces: Temporally Coherent Artistic Animations from Video

Collomosse

Rowntree²,

Hall

2005

IEEE Trans. Visual. Comput. Graphics

View full text Add to dashboard Cite

The contribution of this paper is a novel framework for synthesizing nonphotorealistic animations from real video sequences. We demonstrate that, through automated mid-level analysis of the video sequence as a spatiotemporal volume--a block of frames with time as the third dimension--we are able to generate animations in a wide variety of artistic styles, exhibiting a uniquely high degree of temporal coherence. In addition to rotoscoping, matting, and novel temporal effects unique to our method, we demonstrate the extension of static nonphotorealistic rendering (NPR) styles to video, including painterly, sketchy, and cartoon shading. We demonstrate how this novel coherent shading framework may be combined with our earlier motion emphasis work to produce a comprehensive "Video Paintbox" capable of rendering complete cartoon-styled animations from video clips.

show abstract

Everything You Wanted to Know about Deep Learning for Computer Vision but Were Afraid to Ask

Ponti¹,

Ribeiro²,

Nazaré³

et al. 2017

106

View full text Add to dashboard Cite

show abstract

BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography

et al. 2017

View full text Add to dashboard Cite

Computer vision systems are designed to work well within the context of everyday photography. However, artists often render the world around them in ways that do not resemble photographs. Artwork produced by people is not constrained to mimic the physical world, making it more challenging for machines to recognize.This work is a step toward teaching machines how to categorize images in ways that are valuable to humans. First, we collect a large-scale dataset of contemporary artwork from Behance, a website containing millions of portfolios from professional and commercial artists. We annotate Behance imagery with rich attribute labels for content, emotions, and artistic media. Furthermore, we carry out baseline experiments to show the value of this dataset for artistic style prediction, for improving the generality of existing object classifiers, and for the study of visual domain adaptation. We believe our Behance Artistic Media dataset will be a good starting point for researchers wishing to study artistic imagery and relevant problems. This dataset can be found at https://bam-dataset.org/ arXiv:1704.08614v2 [cs.CV]

show abstract

Deep Image Comparator: Learning to Visualize Editorial Change

Black

Bui

Jin

et al. 2021

View full text Add to dashboard Cite

We present a novel architecture for comparing a pair of images to identify image regions that have been subjected to editorial manipulation. We first describe a robust near-duplicate search, for matching a potentially manipulated image circulating online to an image within a trusted database of originals. We then describe a novel architecture for comparing that image pair, to localize regions that have been manipulated to differ from the retrieved original. The localization ignores discrepancies due to benign image transformations that commonly occur during online redistribution. These include artifacts due to noise and recompression degradation, as well as out-of-place transformations due to image padding, warping, and changes in size and shape. Robustness towards out-of-place transformations is achieved via the end-to-end training of a differentiable warping module within the comparator architecture. We demonstrate effective retrieval and comparison of benign transformed and manipulated images, over a dataset of millions of photographs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

John Collomosse

A performance evaluation of gradient field HOG descriptor for sketch based image retrieval

Total Capture: 3D Human Pose Estimation Fusing Video and Inertial Sensors

State of the "Art”: A Taxonomy of Artistic Stylization Techniques for Images and Video

Gradient field descriptor for sketch based retrieval and localization

Stroke Surfaces: Temporally Coherent Artistic Animations from Video

Everything You Wanted to Know about Deep Learning for Computer Vision but Were Afraid to Ask

BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography

Deep Image Comparator: Learning to Visualize Editorial Change

Contact Info

Product

Resources

About