The larger visual identity of a city is often a blend of smaller and distinct visual character zones. Despite the recent popularity of street-view imagery for visual analytics, its role in uncovering such urban visual clusters has been fairly limited. Taking Mumbai as a demonstrative case, we present what is arguably the first city-wide visual cluster analysis of an Indian metropolis. We use a Dense Prediction Transformer (DPT) for semantic segmentation of over 28000 Google Street View (GSV) images collected from over 7000 locations across the city. Unsupervised k-means clustering is carried out on the extracted semantic features (such as greenery, skyview, built-density etc.) for the identification of distinct urban visual typologies. Through iterative analysis, 7 key visual clusters are identified, and Principal Component Analysis (PCA) is used to visualize the variance across them. The feature distributions of each cluster are then qualitatively and quantitatively analysed in order to examine their unique visual configurations. Spatial distributions of the clusters are visualized as well, thus mapping out the different 'faces' of the city. It is hoped that the methodology outlined in this work serves as a base for similar cluster-based inquiries into the visual dimension of other cities across the globe.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.