( 您好!臺灣時間:2021/05/15 17:00
字體大小: 字級放大   字級縮小   預設字形  


研究生(外文):Dai-Long Lee
論文名稱(外文):A Multi-Agent Reinforcement Learning Framework for Datacenter Traffic Optimization
指導教授(外文):Min-Te Sun
外文關鍵詞:Multi-AgentReinforcement LearningDatacenterTraffic Control
  • 被引用被引用:0
  • 點閱點閱:38
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
多年來,資料中心的網路流量優化一直是一項熱門的研究議題,傳統流量優化演算法主要為以資料中心管理者的經驗法則與對網路環境的理解為基礎來實作。然而,隨著現在網路環境越加複雜且快速變化,傳統演算法可能無法適當的處理其中的流量。近年隨著強化學習的蓬勃發展,有許多的相關研究證實使用強化學習應用於網路流量控制的可行性。本研究提出可應用於資料中心流量控制的多代理人的強化學習框架,我們設計常見的拓僕作為模擬環境。利用網路最佳化中經常使用的效用函數作為強化學習中代理人的獎勵函數,再透過深度神經網路讓代理人學習如何最大化獎勵函數,藉此找出最佳的網路控制策略,此外,為了加強代理人於環境中的探索效率,我們在代理人的策略網路參數加入噪聲造成擾動。我們的實驗結果顯示兩件事:1) 當代理人以簡單的深度網路架構實作時,本框架效能亦不會有所損失使 2) 本框架可以達到接近傳統演算法的表現且不需要傳統演算法的必要假設。
Datacenter traffic optimization has been a popular study domain for years. Traditional methods to this problem are mainly based on rules crafted with datacenter operators’ experience and knowledge to the network environment. However, the traffic in a modern datacenter tends to be more complicated and dynamic, which may cause traditional method to fail. With the booming development in deep reinforcement learning, a number of research works have proven to be feasible to adopt deep reinforcement learning in the domain of traffic control. In this research, we propose a multi-agent reinforcement learning framework that can be applied to the problem of datacenter traffic control. The simulation environment is carefully designed to consist of popular topologies. With the reward function based on utility function that is often used for traffic optimization, our agents learn an optimal traffic control policy by maximizing the reward function with the deep neural network. Additionally, to improve the exploration efficiency of our agents in the environment, noise is introduced to perturb parameters of the agent’s policy network. Our experimental results show two points: 1) The performance of our framework does not downgrade when agents are implemented with a simple network architecture. 2) The proposed framework performs nearly as well as popular traffic control schemes without assumptions that are required by those traffic control schemes.
1 Introduction P.1
2 Related Work P.4
2.1 Traditional Traffic Optimization Schemes P.4
2.2 Machine Learning for Traffic Optimization P.5
2.3 Reinforcement Learning for Traffic Optimization P.5
3 Preliminary P.7
3.1 Reinforcement Learning P.7
3.1.1 Deterministic Policy Gradient P.12
3.1.2 Multi-Agent Deterministic Policy Gradient P.13
4 Design P.16
4.1 Environment Configuration P.16
4.1.1 Network Topology P.16
4.1.2 Traffic Pattern P.21
4.2 Proposed Multi-Agent Reinforcement Learning Framework P.23
4.2.1 Definitions of State Space, Action Space, and Reward Function P.25
4.2.2 Actor Network and Critic Network Model P.28
4.2.3 Exploration with Parameter Noise P.31
4.2.4 Framework Algorithm P.33
4.2.5 Feature Scaling P.37
4.2.6 Hybrid Traffic Control Scheme P.38
5 Performance P.39
5.1 Experimental Settings P.39
5.2 Evaluation Metrics P.40
5.3 Experiment Results of Dumbbell Topology P.41
5.4 Experiment Results of JellyFish Topology P.47
6 Conclusion P.54
Reference P.55
[1] Gigabit ethernet. https://en.wikipedia.org/wiki/Gigabit_Ethernet.
[2] Open vswitch. http://www.openvswitch.org/.
[3] Mohammad Al-Fares, Alexander Loukissas, and Amin Vahdat. A scalable, commod-
ity data center network architecture. ACM SIGCOMM computer communication
review, 38(4):63–74, 2008.
[4] Mohammad Alizadeh, Albert Greenberg, David A Maltz, Jitendra Padhye, Parveen
Patel, Balaji Prabhakar, Sudipta Sengupta, and Murari Sridharan. Data center tcp
(dctcp). In Proceedings of the ACM SIGCOMM 2010 conference, pages 63–74, 2010.
[5] Maximilian Bachl, Tanja Zseby, and Joachim Fabini. Rax: Deep reinforcement learn-
ing for congestion control. In ICC 2019-2019 IEEE International Conference on
Communications (ICC), pages 1–6. IEEE, 2019.
[6] Justin A Boyan and Michael L Littman. Packet routing in dynamically changing
networks: A reinforcement learning approach. In Advances in neural information
processing systems, pages 671–678, 1994.
[7] Lawrence S. Brakmo and Larry L. Peterson. Tcp vegas: End to end congestion
avoidance on a global internet. IEEE Journal on selected Areas in communications,
13(8):1465–1480, 1995.
[8] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman,
Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
[9] Tony F Chan, Gene Howard Golub, and Randall J LeVeque. Updating formulae
and a pairwise algorithm for computing sample variances. In COMPSTAT 1982 5th
Symposium held at Toulouse 1982, pages 30–41. Springer, 1982.
[10] Li Chen, Justinas Lingys, Kai Chen, and Feng Liu. Auto: Scaling deep reinforcement
learning for datacenter-scale automatic traffic optimization. In Proceedings of the
2018 Conference of the ACM Special Interest Group on Data Communication, pages
191–205, 2018.
[11] Zhitang Chen, Jiayao Wen, and Yanhui Geng. Predicting future traffic using hidden
markov models. In 2016 IEEE 24th international conference on network protocols
(ICNP), pages 1–6. IEEE, 2016.
[12] Peng Cheng, Fengyuan Ren, Ran Shu, and Chuang Lin. Catch the whole lot in an
action: Rapid precise packet loss notification in data center. In 11th {USENIX}
Symposium on Networked Systems Design and Implementation ({NSDI} 14), pages
17–28, 2014.
[13] Intel Corporation. Pktgen - traffic generator powered by dpdk. https://github.
[14] Ítalo Cunha, Pietro Marchetta, Matt Calder, Yi-Ching Chiu, Bruno VA Machado,
Antonio Pescapè, Vasileios Giotsas, Harsha V Madhyastha, and Ethan Katz-Bassett.
Sibyl: a practical internet route oracle. In 13th {USENIX} Symposium on Networked
Systems Design and Implementation ({NSDI} 16), pages 325–344, 2016.
[15] Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira.
{PCC}: Re-architecting congestion control for consistent high performance. In 12th
{USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
15), pages 395–408, 2015.
[16] Mo Dong, Tong Meng, Doron Zarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey,
and Michael Schapira. {PCC} vivace: Online-learning congestion control. In 15th
{USENIX} Symposium on Networked Systems Design and Implementation ({NSDI}
18), pages 343–356, 2018.
[17] Hewlett Packard Enterprise.
[18] ESnet. iperf3: A tcp, udp, and sctp network bandwidth measurement tool. https:
[19] Sally Floyd, Mark Handley, Jitendra Padhye, and Jörg Widmer. Equation-based con-
gestion control for unicast applications. ACM SIGCOMM Computer Communication
Review, 30(4):43–56, 2000.
[20] Jakob Foerster, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson.
Learning to communicate with deep multi-agent reinforcement learning. In Advances
in neural information processing systems, pages 2137–2145, 2016.
[21] Mark Handley, Costin Raiciu, Alexandru Agache, Andrei Voinescu, Andrew W
Moore, Gianni Antichi, and Marcin Wójcik. Re-architecting datacenter networks
and stacks for low latency and high performance. In Proceedings of the Conference
of the ACM Special Interest Group on Data Communication, pages 29–42, 2017.
[22] Christian Hopps et al. Analysis of an equal-cost multi-path algorithm. Technical
report, RFC 2992, November, 2000.
[23] Charles Hornig. Standard for the transmission of ip datagrams over ethernet net-
works. 1984.
[24] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network
training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
[25] Nathan Jay, Noga Rotman, Brighten Godfrey, Michael Schapira, and Aviv Tamar.
A deep reinforcement learning perspective on internet congestion control. In Inter-
national Conference on Machine Learning, pages 3050–3059, 2019.
[26] Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. Pytheas: Enabling data-
driven quality of experience optimization using group-based exploration-exploitation.
In 14th {USENIX} Symposium on Networked Systems Design and Implementation
({NSDI} 17), pages 393–406, 2017.
[27] Abdul Kabbani and Milad Sharif. Flier: Flow-level congestion-aware routing for
direct-connect data centers. In IEEE INFOCOM 2017-IEEE Conference on Com-
puter Communications, pages 1–9. IEEE, 2017.
[28] Diego Kreutz, Fernando MV Ramos, Paulo Esteves Verissimo, Christian Esteve
Rothenberg, Siamak Azodolmolky, and Steve Uhlig. Software-defined networking:
A comprehensive survey. Proceedings of the IEEE, 103(1):14–76, 2014.
[29] Bob Lantz, Brandon Heller, and Nick McKeown. A network in a laptop: rapid
prototyping for software-defined networks. In Proceedings of the 9th ACM SIGCOMM
Workshop on Hot Topics in Networks, pages 1–6, 2010.
[30] Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,
Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep rein-
forcement learning. arXiv preprint arXiv:1509.02971, 2015.
[31] Ryan Lowe, Yi I Wu, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, and Igor
Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments.
In Advances in neural information processing systems, pages 6379–6390, 2017.
[32] Hongzi Mao, Mohammad Alizadeh, Ishai Menache, and Srikanth Kandula. Resource
management with deep reinforcement learning. In Proceedings of the 15th ACM
Workshop on Hot Topics in Networks, pages 50–56, 2016.
[33] Nick McKeown, Tom Anderson, Hari Balakrishnan, Guru Parulkar, Larry Peterson,
Jennifer Rexford, Scott Shenker, and Jonathan Turner. Openflow: enabling inno-
vation in campus networks. ACM SIGCOMM Computer Communication Review,
38(2):69–74, 2008.
[34] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,
Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg
Ostrovski, et al. Human-level control through deep reinforcement learning. Nature,
518(7540):529–533, 2015.
[35] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness,
Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg
Ostrovski, et al. Human-level control through deep reinforcement learning. Nature,
518(7540):529–533, 2015.
[36] Jeonghoon Mo and Jean Walrand. Fair end-to-end window-based congestion control.
IEEE/ACM Transactions on networking, 8(5):556–567, 2000.
[37] Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y
Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, and Marcin Andrychowicz. Parameter
space noise for exploration. arXiv preprint arXiv:1706.01905, 2017.
[38] K Ramakrishnan, Sally Floyd, and D Black. Rfc3168: The addition of explicit
congestion notification (ecn) to ip, 2001.
[39] Arjun Roy, Hongyi Zeng, Jasmeet Bagga, George Porter, and Alex C Snoeren. In-
side the social network’s (datacenter) network. In Proceedings of the 2015 ACM
Conference on Special Interest Group on Data Communication, pages 123–137, 2015.
[40] Fabian Ruffy, Michael Przystupa, and Ivan Beschastnikh.
Iroko: A framework
to prototype reinforcement learning for data center traffic control. arXiv preprint
arXiv:1812.09975, 2018.
[41] Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, and Ilya Sutskever. Evolu-
tion strategies as a scalable alternative to reinforcement learning. arXiv preprint
arXiv:1703.03864, 2017.
[42] David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George
Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam,
Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree
search. nature, 529(7587):484–489, 2016.
[43] David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin
Riedmiller. Deterministic policy gradient algorithms. 2014.
[44] Joel Sing and Ben Soh. Tcp new vegas: improving the performance of tcp vegas over
high latency links. In Fourth IEEE International Symposium on Network Computing
and Applications, pages 73–82. IEEE, 2005.
[45] Ankit Singla, Chi-Yao Hong, Lucian Popa, and P Brighten Godfrey. Jellyfish: Net-
working data centers randomly. In Presented as part of the 9th {USENIX} Sympo-
sium on Networked Systems Design and Implementation ({NSDI} 12), pages 225–
238, 2012.
[46] Anirudh Sivaraman, Keith Winstein, Pratiksha Thaker, and Hari Balakrishnan. An
experimental study of the learnability of congestion control. ACM SIGCOMM Com-
puter Communication Review, 44(4):479–490, 2014.
[47] Richard S Sutton and Andrew G Barto. Reinforcement learning: An introduction.
[48] udhos, j3hempsey, and apostov. goben. https://github.com/udhos/goben.
[49] Bhanu Chandra Vattikonda, George Porter, Amin Vahdat, and Alex C Snoeren.
Practical tdma for datacenter ethernet. In Proceedings of the 7th ACM european
conference on Computer Systems, pages 225–238, 2012.
[50] Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-
4):279–292, 1992.
[51] Ronald J Williams. Simple statistical gradient-following algorithms for connectionist
reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
[52] Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated con-
gestion control. ACM SIGCOMM Computer Communication Review, 43(4):123–134,
[53] Zhiyuan Xu, Jian Tang, Jingsong Meng, Weiyi Zhang, Yanzhi Wang, Chi Harold
Liu, and Dejun Yang. Experience-driven networking: A deep reinforcement learning
based approach. In IEEE INFOCOM 2018-IEEE Conference on Computer Commu-
nications, pages 1871–1879. IEEE, 2018.
[54] Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip Levis,
and Keith Winstein. Pantheon: the training ground for internet congestion-control
research. In 2018 {USENIX} Annual Technical Conference ({USENIX}{ATC} 18),
pages 731–743, 2018.
[55] Bendong Zhao, Huanzhang Lu, Shangfeng Chen, Junliang Liu, and Dongya Wu.
Convolutional neural networks for time series classification. Journal of Systems En-
gineering and Electronics, 28(1):162–169, 2017.
電子全文 電子全文(網際網路公開日期:20220731)
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top