“…In this work, two reward functions are examined. The first reward function r(u, max f ) depends on the amount of allocated utilities u and the peak-rate demands max f ∈ R n of the connections, leading to the optimization objective of [5]. The second reward function, r(u, f, θ), depends on the amount of allocated utilities u, fluctuations f = [f 1 , ..., f n ] drawn by traffic demand distributions F = [F 1 (•), .., F n (•)] as f ∼ F , and the parameter θ that defines a tolerance amount of unserved traffic for all connections.…”