Chris Buehler scite author profile

Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and topdown attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.

show abstract

Unstructured lumigraph rendering

Buehler¹,

Bosse²,

McMillan³

et al. 2001

763

688

View full text Add to dashboard Cite

We describe an image based rendering approach that generalizes many image based rendering algorithms currently in use including light field rendering and view-dependent texture mapping. In particular it allows for lumigraph style rendering from a set of input cameras that are not restricted to a plane or to any specific manifold. In the case of regular and planar input camera positions, our algorithm reduces to a typical lumigraph approach. In the case of fewer cameras and good approximate geometry, our algorithm behaves like view-dependent texture mapping. Our algorithm achieves this flexibility because it is designed to meet a set of desirable goals that we describe. We demonstrate this flexibility with a variety of examples.

show abstract

Image-based visual hulls

et al. 2000

View full text Add to dashboard Cite

In this paper, we describe an efficient image-based approach to computing and shading visual hulls from silhouette image data. Our algorithm takes advantage of epipolar geometry and incremental computation to achieve a constant rendering cost per rendered pixel. It does not suffer from the computation complexity, limited resolution, or quantization artifacts of previous volumetric approaches. We demonstrate the use of this algorithm in a real-time virtualized reality application running off a small number of video streams.

show abstract

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Anderson¹,

He²,

Buehler³

et al. 2017

Preprint

196

View full text Add to dashboard Cite

Polyhedral Visual Hulls for Real-Time Rendering

Matusik¹,

Buehler²,

McMillan³

2001

136

103

View full text Add to dashboard Cite

Non-metric image-based rendering for video stabilization

Buehler¹,

Bosse²,

McMillan³

112

View full text Add to dashboard Cite

Minimal Surfaces for Stereo

Buehler¹,

Gortler

Cohen

et al. 2002

View full text Add to dashboard Cite

Determining shape from stereo has often been posed as a global minimization problem. Once formulated, the minimization problems are then solved with a variety of algorithmic approaches. These approaches include techniques such as dynamic programming min-cut and alpha-expansion. In this paper we show how an algorithmic technique that constructs a discrete spatial minimal cost surface can be brought to bear on stereo global minimization problems. This problem can then be reduced to a single min-cut problem. We use this approach to solve a new global minimization problem that naturally arises when solving for three-camera (trinocular) stereo. Our formulation treats the three cameras symmetrically, while imposing a natural occlusion cost and uniqueness constraint.

show abstract

Rich Image Captioning in the Wild

Tran¹,

He²,

Zhang³

et al. 2016

Preprint

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chris Buehler

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Unstructured lumigraph rendering

Image-based visual hulls

Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Polyhedral Visual Hulls for Real-Time Rendering

Non-metric image-based rendering for video stabilization

Minimal Surfaces for Stereo

Rich Image Captioning in the Wild

Contact Info

Product

Resources

About