The sharing economy has upset market for housing and transportation services. Homeowners can rent out their property when they are away on vacation, car owners can offer ridesharing services. These sharing economy business models are based on monetizing under-utilized infrastructure. They are enabled by peer-to-peer platforms that match eager sellers with willing buyers.Are there compelling sharing economy opportunities in the electricity sector? What products or services can be shared in tomorrow's Smart Grid? We begin by exploring sharing economy opportunities in the electricity sector, and discuss regulatory and technical obstacles to these opportunities. We then study the specific problem of a collection of firms sharing their electricity storage. We characterize equilibrium prices for shared storage in a spot market. We formulate storage investment decisions of the firms as a non-convex non-cooperative game. We show that under a mild alignment condition, a Nash equilibrium exists, it is unique, and it supports the social welfare. We discuss technology platforms necessary for the physical exchange of power, and market platforms necessary to trade electricity storage. We close with synthetic examples to illustrate our ideas.
We propose empirical dynamic programming algorithms for Markov decision processes. In these algorithms, the exact expectation in the Bellman operator in classical value iteration is replaced by an empirical estimate to get “empirical value iteration” (EVI). Policy evaluation and policy improvement in classical policy iteration are also replaced by simulation to get “empirical policy iteration” (EPI). Thus, these empirical dynamic programming algorithms involve iteration of a random operator, the empirical Bellman operator. We introduce notions of probabilistic fixed points for such random monotone operators. We develop a stochastic dominance framework for convergence analysis of such operators. We then use this to give sample complexity bounds for both EVI and EPI. We then provide various variations and extensions to asynchronous empirical dynamic programming, the minimax empirical dynamic program, and show how this can also be used to solve the dynamic newsvendor problem. Preliminary experimental results suggest a faster rate of convergence than stochastic approximation algorithms.
We consider the problem of distributed online learning with multiple players in multi-armed bandits (MAB) models. Each player can pick among multiple arms. When a player picks an arm, it gets a reward. Any other communication between the users is costly and will add to the regret. We propose an online index-based distributed learning policy called dUCB 4 algorithm that trades off exploration v. exploitation in the right way, and achieves expected regret that grows at most as near-O(log 2 T ). The motivation comes from opportunistic spectrum access by multiple secondary users in cognitive radio networks wherein they must pick among various wireless channels that look different to different users. This is the first distributed learning algorithm for multi-player MABs to the best of our knowledge.
Index TermsDistributed adaptive control, multi-armed bandit, online learning, multi-agent systems.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.