The quality of geocoding has received substantial attention in recent years. A synthesis of published studies shows that the positional errors of street geocoding are somewhat unique relative to those of other types of spatial data: 1) the magnitude of error varies strongly across urban-rural gradients; 2) the direction of error is not uniform, but strongly associated with the properties of local street segments; 3) the distribution of errors does not follow a normal distribution, but is highly skewed and characterized by a substantial number of very large error values; and 4) the magnitude of error is spatially autocorrelated and is related to properties of the reference data. This makes it difficult to employ analytic approaches or Monte Carlo simulations for error propagation modeling because these rely on generalized statistical characteristics. The current paper describes an alternative empirical approach to error propagation modeling for geocoded data and illustrates its implementation using three different case-studies of geocoded individual-level datasets. The first case-study consists of determining the land cover categories associated with geocoded addresses using a point-in-raster overlay. The second case-study consists of a local hotspot characterization using kernel density analysis of geocoded addresses. The third case-study consists of a spatial data aggregation using enumeration areas of varying spatial resolution. For each case-study a high quality reference scenario based on address points forms the basis for the analysis, which is then compared to the result of various street geocoding techniques. Results show that the unique nature of the positional error of street geocoding introduces substantial noise in the result of spatial analysis, including a substantial amount of bias for some analysis scenarios. This confirms findings from earlier studies, but expands these to a wider range of analytical techniques.
The Topologically Integrated Geographic Encoding and Referencing (TIGER) data are an essential part of the US Census and represent a critical element in the nation's spatial data infrastructure. TIGER data for the year 2000, however, are of limited positional accuracy and were deemed of insufficient quality to support the 2010 Census. In response the US Census Bureau embarked on the MAF/TIGER Accuracy Improvement Project (MTAIP) in an effort to improve the positional accuracy of the database, modernize the data processing environment and improve cooperation with partner agencies. Improved TIGER data were released for the entire US just before the 2010 Census. The current study characterizes the positional accuracy of the TIGER 2009 data compared with the TIGER 2000 data based on selected road intersections. Three US counties were identified as study areas and in each county 100 urban and 100 rural sample locations were selected. Features in the TIGER 2000 and 2009 data were compared with reference locations derived from high resolution natural color orthoimagery. Results indicate that TIGER 2009 data are much improved in terms of positional accuracy compared with the TIGER 2000 data, by at least one order of magnitude across urban and rural areas in all three counties for most accuracy metrics. TIGER 2009 is consistently more accurate in urban areas compared with rural areas, by a factor of at least two for most accuracy metrics. Despite the substantial improvement in positional accuracy, large positional errors of greater than 10 m are relatively common in the TIGER 2009 data, in most cases representing remnant segments of minor roads from older versions of the TIGER data. As a result, based on the US Census Bureau's suggested accuracy metric, the TIGER 2009 data meet the accuracy expectation of 7.6 m for two of the three urban areas but for none of the three rural areas. The suggested metric is based on the National Standard for Spatial Data Accuracy (NSSDA) protocol and was found to be very sensitive to the presence of a small number of very large errors. This presents challenges during attempts to characterize the accuracy of TIGER data or other spatial data using this protocol.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.