Long-term individual household forecasting is useful in various applications, e.g., to determine customers' advance payments. However, the literature on this type of forecasting is limited; existing methods either focus on short-term predictions for individual households, or long-term prediction at an aggregated level (e.g. neighborhood). To fill this gap, we present a method that predicts the monthly consumption of individual households over the next year, given only a few months of consumption data during the current year. Utility companies can exploit this method to predict the consumption of any customer for the next year even with incomplete data. The method consists of three steps: clustering the data using k-means, prediction using an ensemble of forecasts based on the historical median distribution among similar households, and smoothing the predictions to remove weather-dependent patterns. The method is highly accurate as it finished third in the IEEE-CIS competition (and ranks first when leveraging insights from another team), focused on forecasting long-term household consumption with incomplete data. It is also very scalable thanks to its low computational complexity and weak data requirements: the method only requires a few months of historical data and no household-specific or weather information.
Constraint-based clustering leverages user-provided constraints to produce a clustering that matches the user's expectation. In active constraint-based clustering, the algorithm selects the most informative constraints to query in order to produce good clusterings with as few constraints as possible. A major challenge in constraint-based clustering is handling noise: the majority of existing approaches assume that the provided constraints are correct, while that might not be the case. In this paper, we propose a method to identify and correct noisy constraints in active constraint-based clustering. Our approach reasons probabilistically about the correctness of the user's answers and asks additional constraints to corroborate or correct the suspicious answers. We demonstrate the method's effectiveness by incorporating it into COBRAS, a state-of-the-art method for active constraint-based clustering. Compared to COBRAS and other active-constraint-based clustering algorithms, the resulting system produces better clusterings in the presence of noise.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.