Estimating the per-capita income and the household income at a fine-grained geographical scale is critical but challenging, even across the developed economies. In this paper, a novel Siamese-like Convolutional Neural Network, integrating Ridge Regression and Gaussian Process Regression, has been developed for fine-grained estimation of income across different parts of New York City. Our model (the GP-Mixed-Siamese-like-Double-Ridge model) makes good use of the pairwise comparison of locationbased house price information, daytime satellite image, street view and spatial location information as the inputs. Taking the per-capita income and the median household income in New York City as the ground truths, our model outperforms (R 2 = 0.72-0.86 for five-fold validation) other state-of-the-art income estimation models and achieves good performance in cross-district and cross-scale validation. We also find that models which partially share our model architecture, including the Spatial-Information-GP and the Mixed-Siamese-like model, perform well under certain spatial granularity and data availability. Since such models rely on less data input types and simpler architectures, they can be used to save resources on data collection and model training. Hence, using our model for fine-grained income estimation does not mean excluding these models that share similar architectures. Our fine-grained income estimation model can allow the per-capita and the household income data generated in fine-grained resolution to couple with other types of data, such as the air pollution or the epidemic data, of the same scale, to ensure that any location-specific socio-economic-related study and evidence-based decision-making at the fine-grained resolution can be conducted. Future research will focus on extending our model for fine-grained income estimation in developing metropolises, and for developing other socio-economic indicators.INDEX TERMS Daytime satellite image, developed metropolis, fine-grained resolution, GP-Mixed-Siamese-like-Double-Ridge model, house price, household income, per-capita income, Siamese-like Convolutional Neural Network, street view
I. INTRODUCTIONMeasuring income 1 distribution at a high spatial resolution is critical but challenging, even for developed economies [1-3].1 According to the definition of American Community Survey, "Total income" refers to the sum of incomes reported separately for wage or salary income; net self-employment income; interest, dividends, or net rental or royalty income, or income from estates and trusts; Social Security or Railroad Retirement Income; Supplemental Security Income (SSI); Accurate income data are mainly obtained from field surveys, which can be highly capital intensive [2]. Over the past few decades, attempts have been made to overcome data scarcity and to estimate fine-grained income distribution across developing or non-urban areas [4][5][6][7]. Few studies have attempted to make good use of proxy data and deep learning public assistance or welfare payments; retireme...