Modern airborne LIDAR instruments are capable of accurately measuring fine detail, making them ideal for producing 3D models. A closely related problem is how best to map passive aerial imagery to the LIDAR-derived models. Typical solutions to this problem involve first constructing a surface representation from the LIDAR, then projecting imagery onto this surface. Unfortunately, a surface model can introduce errors into the process because it is not a good representation of the underlying scene geometry in areas containing overlapping or complex surfaces. A voxel-based 3D model of the LIDAR geometry is one alternative to the surface representation, and we show how this achieves more accurate results in complex areas when compared to existing approaches. Additional information we derive for the voxel model can also be used to assist with fusing the aerial imagery by driving quality metrics, and we demonstrate how this gives improved results. Multiple images covering the same area are required in order to capture details occluded in any single aerial photograph. We show how this occlusion affects fusion with the 3D model, and how any redundant color information can be filtered further to produce better products. Results are presented from our voxel-based fusion technique using LIDAR and coincident visible aerial imagery collected in the summer of 2011 over downtown Rochester, NY.