A Sleeping, Recovering Bandit Algorithm for Optimizing Recurring Notifications

Yancey, Kevin P.; Settles, Burr

doi:10.1145/3394486.3403351

Cited by 4 publications

(2 citation statements)

References 8 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Zhao et al [2018] propose a machine learning approach to decide notification volume for each user. Yancey and Settles [2020] propose a multi-armed bandit approach for notification optimization. Yue et al [2022] propose a ranking solution to decide which notification to send to users.…”

Section: Related Workmentioning

confidence: 99%

Fair Notification Optimization: An Auction Approach

Kroer¹,

Sinha²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

Notifications are important for the user experience in mobile apps and can influence their engagement. However, too many notifications can be disruptive for users. A typical mobile app usually has several types of notification, managed by distinct teams with objectives that are possibly conflicting with each other, or even with the overall platform objective. Therefore, there is a need for careful curation of notifications sent to users of these different types. In this work, we study a novel centralized approach for notification optimization, where we view the opportunities to send user notifications as items and types of notifications as buyers in an auction market. Furthermore, the auction setup is unique, and the platform has the ability to subsidize the bids from the notification types.Using tools from fair division, we study the application of competitive equilibrium for addressing this problem. We show that an Eisenberg-Gale-style convex program allows us to find an allocation that is fair to all notification types in hindsight. Using the dual of the formulation, we present an online algorithm that allocates notifications via first-price auctions using a pacing-multiplier approach. Secondly, we introduce an approach based on second-price auctions and pacing, which has the benefit of working well with existing advertising systems built for second-price auctions.Through an A/B test in production, we show that the second price-based auction system improves over a decentralized notification optimization system, leading to its launch in production for some Instagram notifications. Further, through simulations on Instagram notification data and a subsequent production A/B test, we compare the outcomes of first-price and second-price auctions and show that the former has more stable pacing multipliers.

show abstract

Section: Related Workmentioning

confidence: 99%

Fair Notification Optimization: An Auction Approach

Kroer¹,

Sinha²,

Zhang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…The authors compare their algorithm to a Bayesian d-step lookahead benchmark, which is the greedy algorithm optimizing the next d pulls given the decision maker's current situation. In comparison, our benchmark is concerned with the total reward of the whole time horizon T rather than a pre-fixed d. Some other related work include Mintz et al (2020) and Yancey & Settles (2020). In Mintz et al (2020), the recovery function is characterized via a parametric form, while the authors obtain a worst-case regret of O(T 2 3 ).…”

Section: Related Literaturementioning

confidence: 99%

Offline Planning and Online Learning under Recovering Rewards

Simchi‐Levi¹,

Zheng²,

Zhu³

2021

Preprint

View full text Add to dashboard Cite

Motivated by emerging applications such as livestreaming e-commerce, promotions and recommendations, we introduce a general class of multi-armed bandit problems that have the following two features: (i) the decision maker can pull and collect rewards from at most K out of N different arms in each time period; (ii) the expected reward of an arm immediately drops after it is pulled, and then non-parametrically recovers as the idle time increases. With the objective of maximizing expected cumulative rewards over T time periods, we propose, construct and prove performance guarantees for a class of "Purely Periodic Policies". For the offline problem when all model parameters are known, our proposed policy obtains an approximation ratio that is at the order of 1 − O(1/ √ K), which is asymptotically optimal when K grows to infinity. For the online problem when the model parameters are unknown and need to be learned, we design an Upper Confidence Bound (UCB) based policy that approximately has O(N √ T ) regret against the offline benchmark. Our framework and policy design may have the potential to be adapted into other offline planning and online learning applications with non-stationary and recovering rewards.

show abstract