15Building energy data has been used for decades to understand energy flows in 16 buildings and plan for future energy demand. Recent market, technology and policy 17 drivers have resulted in widespread data collection by stakeholders across the 18 buildings industry. Consolidation of independently collected and maintained 19 datasets presents a cost-effective opportunity to build a database of unprecedented 20 size. Applications of the data include peer group analysis to evaluate building 21 performance, and data-driven algorithms that use empirical data to estimate energy 22 savings associated with building retrofits. This paper discusses technical 23considerations in compiling such a database using the DOE Buildings Performance 24 Database (BPD) as a case study. We gathered data on over 700,000 residential and 25 commercial buildings. We describe the process and challenges of mapping and 26 cleansing data from disparate sources. We analyze the distributions of buildings in 27 the BPD relative to the Commercial Building Energy Consumption Survey (CBECS) 28 and Residential Energy Consumption Survey (RECS), evaluating peer groups of 29 buildings that are well or poorly represented, and discussing how differences in the 30 distributions of the three datasets impact use-cases of the data. Finally, we discuss 31 the usefulness and limitations of the current dataset and the outlook for increasing 32 its size and applications. 33
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.