Figure 1: Our WikiScenes dataset combines 3D reconstructions, images, and language descriptions for dozens of landmarks, like the Barcelona and Reims Cathedrals pictured above. WikiScenes enables new tasks that combine different modalities, such as associating semantic concepts like "portal", "facade", and "tower" (colored in pink, blue and brown, respectively) with 3D structure across all cathedrals.