With the further liberalization of the electricity market of China, customers' requirements, characteristics, and distribution, as well as the quality, security, and reliability of power supplies without interruption, have received considerable attention from power companies, policymakers, and researchers. How to deeply explore the distribution characteristics of electricity customers and analyze their sensitivities to electricity blackouts has become an especially important problem. This paper takes over 0.1 billion data, collected by various smart devices of the Internet of Things in the power system of China, such as smart meters, intelligent power consumption interactive terminals, data concentrators, and other cross-platform data, for example, 95 598 telephone records, complaint information, user bills, user information, and maintenance records, as study objects, to analyze the consumption characteristics of power users. It has been found that there is a wide range of power users who pay different electricity bills; a long-tail distribution following a power law lies in the number of users versus their paid electricity bills. Meanwhile, there are two Pareto effects (2-8 rule): the number of residents and non-residents versus their electricity bills, and the number of large industrial users and general industry (business users) versus in their electricity consumption and bills. Then, a decision tree algorithm is proposed to capture the characteristics of electricity consumers and to recognize the crowd who is power blackout sensitive. The evaluation indexes and parameters of the decision tree are discussed in detail, and a comparison with other intelligent algorithms shows that the decision tree has a good recognition performance over that of others, and the characteristics used to identify the blackoutsensitive crowd is various. All the results state that except for economic factors, positive social effects should also be considered. Various marketing strategies to satisfy different requirements of power users should be provided to promote long-term relationships between the power companies and power customers. INDEX TERMS Blackout sensitivity, big data, decision tree, electricity market, Internet of Things, longtailed, Pareto effect. NOMENCLATURE Notations Description Y The dependent variable, or target variable. It can be ordinal categorical, nominal categorical or continuous. If Y is categorical with J classes, its class takes values in C = {1,. .. , }.
Regardless of the type of data, traditional Bloom filters treat each element of a set as a string, and by iterating every character of the string, they discretize all data randomly and uniformly. However, with the data size and dimension increases, these variants are inefficient. To better discretize vectors with high numerical dimensions, this paper improves the string hashes to integer hashes. Based on the integer hashes and a counter array, we propose a new variant-high-dimensional Bloom filter (HDBF)-to extend the Bloom filter into high-dimensional spaces, which can represent and query numerical vectors of a big set with a low false positive probability. This paper theoretically analyzes the feasibility of the integer hashes on discretizing data and discusses the relationship of parameters of the HDBF. The experiments illustrate that, in high-dimensional numerical spaces, the HDBF shows better randomness on distribution and entropy than that of the counting Bloom filter. Compared with the parallel Bloom filters, for a fixed false positive probability, the HDBF displays time-space overheads, and is more suitable to deal with the numerical vectors with high dimensions.
In high-dimensional spaces, accuracy and similarity search by low computing and storage costs are always difficult research topics, and there is a balance between efficiency and accuracy. In this paper, we propose a new structure Similar-PBF-PHT to represent items of a set with high dimensions and retrieve accurate and similar items. The Similar-PBF-PHT contains three parts: parallel bloom filters (PBFs), parallel hash tables (PHTs), and a bitmatrix. Experiments show that the Similar-PBF-PHT is effective in membership query and K-nearest neighbors (K-NN) search. With accurate querying, the Similar-PBF-PHT owns low hit false positive probability (FPP) and acceptable memory costs. With K-NN querying, the average overall ratio and rank-i ratio of the Hamming distance are accurate and ratios of the Euclidean distance are acceptable. It takes CPU time not I/O times to retrieve accurate and similar items and can deal with different data formats not only numerical values.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.