Resource Management with Deep Reinforcement Learning

Mao, Hongzi; Alizadeh, Mohammad; Menache, Ishai; Kandula, Srikanth

doi:10.1145/3005745.3005750

Cited by 821 publications

(473 citation statements)

References 13 publications

Supporting

Mentioning

472

Contrasting

Order By: Relevance

“…In this thesis, we propose Pensieve,' a system that learns ABR algorithms automatically, without using any pre-programmed models or explicit assumptions about the operating environment. Pensieve uses modern Reinforcement Learning (RL) techniques [20,21,22] to learn a control policy for bitrate adaptation purely through experience. During training, Pensieve starts knowing nothing about the task at hand.…”

Section: Introductionmentioning

confidence: 99%

Neural Adaptive Video Streaming with Pensieve

Mao¹,

Netravali²,

Alizadeh³

2017

Proceedings of the Conference of the ACM Special Interest Group on Data Communication

Self Cite

941

631

View full text Add to dashboard Cite

Client-side video players employ bitrate adaptation algorithms to cater to the evergrowing QoE requirements of users. These ABR algorithms must balance multiple QoE factors, such as maximizing video bitrate and minimizing rebuffering times. Despite the abundance of recently proposed ABR algorithms, state-of-the-art schemes suffer from two practical challenges: (1) throughput prediction is difficult and inaccurate predictions can lead to degraded performance; (2) existing algorithms use fixed heuristics which have been fine-tuned according to strict assumptions about deployment environments -such tuning precludes generalization across network conditions and QoE objectives.To overcome these challenges, we develop Pensieve, a system that generates ABR algorithms entirely using Reinforcement Learning (RL). Pensieve uses RL to train a neural network model that selects bitrates for future video chunks based on observations collected by client video players. Unlike existing approaches, Pensieve does not rely upon pre-programmed models or assumptions about the environment. Instead, it learns to make ABR decisions solely through observations of the resulting performance of past decisions. As a result, Pensieve can automatically learn ABR algorithms that adapt to a wide range of environmental conditions and QoE metrics. We compare Pensieve to state-of-the-art ABR algorithms using trace-driven and real world experiments spanning a wide variety of network conditions, QoE metrics, and video properties. In all considered scenarios, Pensieve outperforms the best stateof-the-art scheme, with improvements in average QoE of 13.1%-25.0%. Pensieve's policies generalize well, outperforming existing schemes even on networks on which it was not trained.

show abstract

Section: Introductionmentioning

confidence: 99%

Neural Adaptive Video Streaming with Pensieve

Mao¹,

Netravali²,

Alizadeh³

2017

Proceedings of the Conference of the ACM Special Interest Group on Data Communication

Self Cite

941

631

View full text Add to dashboard Cite

show abstract

“…As a policy gradient method of DRL, deep deterministic policy gradient (DDPG) can address a large action space. Thus, DDPG for building DRL‐Flow was employed; DDPG's detailed algorithm is presented in . Because the policy network is represented as a CNN that accepts as input a collection of states s and outputs a probability distribution over all possible actions, this paper used realistic DCN mix‐flow traffic (in our case, the web search workload dataset) to train the CNN through a variant of the REINFORCE algorithm as described in .…”

Section: Resultsmentioning

confidence: 99%

Mix‐flow scheduling using deep reinforcement learning for software‐defined data‐center networks

Liu¹,

Cai

Wang

et al. 2019

Internet Technology Letters

View full text Add to dashboard Cite

For a mix‐flow scenario in software‐defined data‐center networks, how to simultaneously achieve the different performance requirements of the different types of flows is a considerable challenge. This paper proposes a mix‐flow scheduling scheme based on deep reinforcement learning (DRL). This paper establishes three private link sets for three types of flows. Then, DRL is employed to adaptively and intelligently allocate bandwidth for each type of flow according to the traffic variations across time and space. A novel metric is designed as a function of DRL's reward to guide the process of simultaneously maximizing the deadline meet rate for mice flows (MF) and minimizing the flow completion time for elephant flows. Within these three private link sets, three flow‐scheduling strategies (ie, priority‐based allocation for MF, stable matching‐based allocation for elephant flows with unknown sizes, and proportion‐based allocation for elephant flows with known sizes) are employed. A simulation demonstrates the effectiveness of the proposed scheme compared with previous methods (Fincher and pFabric). DRL‐Flow's overhead also is minimal to satisfy the scalability well and is deployable in a large‐scale network.

show abstract

“…However, such studies are not targeted towards cloud users and do not explore resource usage data as a means to detect breaches. Even though machine learning and statistical techniques have been applied before in the context of cloud computing to detect performance anomalies [16], [17], [18], optimize resource allocation [19], and reduce energy usage [20], these approaches have not, to the best of our knowledge, been implemented to defend against resource compromises in public cloud.…”

Section: A Background and Related Workmentioning

confidence: 99%

User-profile-based analytics for detecting cloud security breaches

Tiwari

Türk

Oprea

et al. 2017

2017 IEEE International Conference on Big Data (Big Data)

View full text Add to dashboard Cite

Abstract-While the growth of cloud-based technologies has benefited the society tremendously, it has also increased the surface area for cyber attacks. Given that cloud services are prevalent today, it is critical to devise systems that detect intrusions. One form of security breach in the cloud is when cybercriminals compromise Virtual Machines (VMs) of unwitting users and, then, utilize user resources to run time-consuming, malicious, or illegal applications for their own benefit. This work proposes a method to detect unusual resource usage trends and alert the user and the administrator in real time. We experiment with three categories of methods: simple statistical techniques, unsupervised classification, and regression. So far, our approach successfully detects anomalous resource usage when experimenting with typical trends synthesized from published real-world web server logs and cluster traces. We observe the best results with unsupervised classification, which gives an average F1-score of 0.83 for web server logs and 0.95 for the cluster traces.

show abstract

Resource Management with Deep Reinforcement Learning

Cited by 821 publications

References 13 publications

Neural Adaptive Video Streaming with Pensieve

Neural Adaptive Video Streaming with Pensieve

Mix‐flow scheduling using deep reinforcement learning for software‐defined data‐center networks

User-profile-based analytics for detecting cloud security breaches

Contact Info

Product

Resources

About