Fig. 1. Structured Urban Reconstruction. Given street-level imagery, GIS footprints, and a coarse 3D mesh (left), we formulate a global optimization to automatically fuse these noisy, incomplete, and conflicting data sources to create building footprints (middle: colored horizontal polygons) with profiles (vertical ribbons shown for several footprints) and attached building façades (vertical rectangles). The output encodes a structured urban model (right) including the walls, roof, and associated building elements (e.g., windows, balconies, roof, wall color, etc.). Inset below: A reference aerial image.The creation of high-quality semantically parsed 3D models for dense metropolitan areas is a fundamental urban modeling problem. Although recent advances in acquisition techniques and processing algorithms have resulted in large-scale imagery or 3D polygonal reconstructions, such data-sources are typically noisy, and incomplete, with no semantic structure. In this paper, we present an automatic data fusion technique that produces high-quality structured models of city blocks. From coarse polygonal meshes, street-level imagery, and GIS footprints, we formulate a binary integer program that globally balances sources of error to produce semantically parsed mass models with associated façade elements. We demonstrate our system on four city regions of varying complexity; our examples typically contain densely built urban blocks spanning hundreds of buildings. In our largest example, we produce a structured model of 37 city blocks spanning a total of 1,011 buildings at a scale and quality previously impossible to achieve automatically.