There are numerous real-world problems where a user must make decisions under uncertainty. For the problem of influence maximization on a social network, for example, the user must select a set of K influencers who will jointly have a large influence on many users. With the lack of prior knowledge about the diffusion process or even topological information, this problem becomes quite challenging. This problem can be cast as a combinatorial bandit problem, where the user can repeatedly choose a candidate set of K out of N arms at each time, with an aim to achieve an efficient trade-off between exploration and exploitation. In this work, we present the first combinatorial bandit algorithm for which the only feedback is a non-linear reward of the selected K arms. No other feedback is needed. In the context of influence maximization, this means no feedback in the form of which nodes or edges were activated needs to be available, just the amount of influence. The novel algorithm we propose, CMAB-SM, is based on a divide-and-conquer strategy. It is computationally and storage efficient. Over a time horizon T , the proposed algorithm achieves a regret bound of Õ( K 1/2 N 1/3 T 2/3 ). This bound is sub-linear in all of the parameters: T , N , and K . We empirically demonstrate our algorithm’s performance using the applications of influence maximization and product cross-selling. For influence maximization, we provide experiments on real-world social networks, showing that the proposed CMAB algorithm outperforms bandit-specific and social-influence-domain-specific algorithms in terms of empirical run-time and expected influence. For product cross-selling, we also demonstrate that the proposed CMAB algorithm outperforms considered baselines on synthetic data.
We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of Õ(K√KNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √log 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.
The growth in online goods delivery is causing a dramatic surge in urban vehicle traffic from last-mile deliveries. On the other hand, ride-sharing has been on the rise with the success of ride-sharing platforms and increased research on using autonomous vehicle technologies for routing and matching. The future of urban mobility for passengers and goods relies on leveraging new methods that minimize operational costs and environmental footprints of transportation systems.This paper considers combining passenger transportation with goods delivery to improve vehicle-based transportation. Even though the problem has been studied with a defined dynamics model of the transportation system environment, this paper considers a model-free approach that has been demonstrated to be adaptable to new or erratic environment dynamics. We propose FlexPool, a distributed model-free deep reinforcement learning algorithm that jointly serves passengers & goods workloads by learning optimal dispatch policies from its interaction with the environment. The proposed algorithm pools passengers for a ride-sharing service and delivers goods using a multi-hop transit method. These flexibilities decrease the fleet's operational cost and environmental footprint while maintaining service levels for passengers and goods. Through simulations on a realistic multiagent urban mobility platform, we demonstrate that FlexPool outperforms other model-free settings in serving the demands from passengers & goods. FlexPool achieves 30% higher fleet utilization and 35% higher fuel efficiency in comparison to (i) model-free approaches where vehicles transport a combination of passengers & goods without the use of multi-hop transit, and (ii) model-free approaches where vehicles exclusively transport either passengers or goods.
With the freight delivery demands and shipping costs increasing rapidly, intelligent control of fleets to enable efficient and cost-conscious solutions becomes an important problem. In this paper, we propose DeepFreight, a model-free deep-reinforcement-learning-based algorithm for multi-transfer freight delivery, which includes two closely-collaborative components: truck-dispatch and package-matching. Specifically, a deep multi-agent reinforcement learning framework called QMIX is leveraged to learn a dispatch policy, with which we can obtain the multi-step joint vehicle dispatch decisions for the fleet with respect to the delivery requests. Then an efficient multi-transfer matching algorithm is executed to assign the delivery requests to the trucks. Also, DeepFreight is integrated with a Mixed-Integer Linear Programming optimizer for further optimization. The evaluation results shows that the proposed system is highly scalable and ensures a 100% delivery success while maintaining low delivery-time and fuel consumption.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.