(3.235.245.219) 您好!臺灣時間:2021/05/07 22:34
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳彥辰
研究生(外文):Yen-Chen Chen
論文名稱(外文):Scalable Radio Resource Management using DDPG Meta Reinforcement Learning
指導教授:黃志煒
學位類別:碩士
校院名稱:國立中央大學
系所名稱:通訊工程學系
學門:工程學門
學類:電資工程學類
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:40
中文關鍵詞:資源管理虛擬實境
相關次數:
  • 被引用被引用:0
  • 點閱點閱:29
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於第五代通訊系統(5G)的發展,增強了網路系統的能力及靈活度,將允許更多極端且嚴峻的應用服務出現在第五代通訊系統上,例如大型多人的虛擬實境線上遊戲。邊緣雲端網路架構期待能有效地用來提升虛擬實境應用。然而,在多人虛擬實境的環境下,使用者的行為會受到其他使用者或是虛擬環境中的物件影響,也因此導致了資源管理的複雜度增加而變得比以往更加困難。在這篇研究中,我們採用了Deep Deterministic Policy Gradient (DDPG) 機器學習演算法來進行資源管理。我們整合了3D資源管理架構並針對機器學習提出了組件化的執行動作,並利用使用者的互動狀態進行分組。
由於現有的機器學習探索策略不適合用在長時間的資源管理上,我們提出了透過meta learning架構的探索策略來強化DDPG演算法。機器學習面臨的另一個挑戰是,當我們改變了輸入資料的維度會導致已經訓練好的模型會陷入無用武之地。我們提出「環境資訊對輸入」的翻譯機,在放入機器學習演算法之前,將環境資訊編碼成輸入,編碼後的輸入資料會擁有固定維度,就能放入已經訓練好的模型之中。
從實驗結果顯示,我們提出的meta DDPG演算法可以達到最高的滿足率,而我們提出的編碼架構雖然會讓表現稍微變差,不過當我們的模型遇到新的環境時,可以不用重新訓練新的模型,能夠直接使用,而這也會是比較有效率的學習方式。
The development of the fifth-generation (5G) system on capability and flexibility enables emerging applications with stringent requirements. Mobile edge cloud (MEC) is expected to be an effective solution to serve virtual reality (VR) applications over wireless networks. In multi-user VR environments, highly dynamic interaction between users increases the difficulty and complexity of radio resource management (RRM). Furthermore, a trained management model is often obsolete when particular key environment parameters are changed. In this thesis, a scalable deep reinforcement learning-based approach is proposed specifically for resource scheduling in the edge network. We integrate a 3D radio resource structure with componentized Markov decision process (MDP) actions to work on user interactivity-based groups. A translator-inspired "information-to-state" encoder is applied to generate a scalable RRM model, which can be reused for environments with various numbers of base stations. Also, a meta-learning-based exploration strategy is introduced to improve the exploration in the deep deterministic policy gradient (DDPG) training process. The result shows that the modified meta exploration strategy improves DDPG significantly. The scalable learning structure with complete model reuse provides comparable performance to individually trained models.
1 Introduction
1.1 Virtual Reality .................................1
1.2 Motivation ......................................1
1.3 Contribution ....................................2
1.4 Framework .......................................3

2 Background and Related Works
2.1 Reinforcement Learning ..........................4
2.2 Meta Learning ...................................5
2.3 Resource Management .............................5

3 User Interactive Radio Resource Management ........7
3.1 System Model ....................................7
3.2 Problem Formulation .............................8
3.3 User Interaction and Grouping Mechanism .........8

4 Scalable RRM via Meta Reinforcement learning ......11
4.1 MDP Model .......................................11
4.2 Componentized RRM with DDPG .....................12
4.3 Scalability Extension through Translation Model .14
4.4 Exploration via Meta-Learning ...................16

5 Performance Evaluation ...........................19
5.1 Simulation Setting ..............................19
5.2 Performance .....................................21

6 Conclusion and Future Work ........................26
6.1 Conclusion ......................................26
6.2 Future Work .....................................26

Bibliography ........................................27
[1] J. Park, P. Popovski, and O. Simeone, “Minimizing latency to support vr social in-teractions over wireless cellular systems via bandwidth allocation,”IEEE WirelessCommunications Letters, vol. 7, no. 5, pp. 776–779, Oct 2018.
[2] L. Wang, L. Jiao, T. He, J. Li, and M. M ̈uhlh ̈auser, “Service entity placement forsocial virtual reality applications in edge computing,”IEEE INFOCOM 2018 - IEEEConference on Computer Communications, pp. 468–476, 2018.
[3] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, andD. Wierstra, “Continuous control with deep reinforcement learning,” 2015.
[4] S.-C. Tseng, Z.-W. Liu, Y.-C. Chou, and C.-W. Huang, “Radio Resource Schedulingfor 5G NR via Deep Deterministic Policy Gradient,” in2019 IEEE InternationalConference on Communications Workshops (ICC Workshops), Shanghai, China,May 2019, pp. 1–6.
[5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare,A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovskiet al., “Human-level con-trol through deep reinforcement learning,”nature, vol. 518, no. 7540, pp. 529–533,2015.
[6] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptationof deep networks,”arXiv preprint arXiv:1703.03400, 2017.
[7] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,”inAdvances in neural information processing systems, 2017, pp. 4077–4087.
[8] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shotimage recognition,” inICML deep learning workshop, vol. 2. Lille, 2015.
[9] J. X. Wang, Z. Kurth-Nelson, D. Tirumala, H. Soyer, J. Z. Leibo, R. Munos, C. Blun-dell, D. Kumaran, and M. Botvinick, “Learning to reinforcement learn,”arXivpreprint arXiv:1611.05763, 2016.
[10] Y. Duan, J. Schulman, X. Chen, P. L. Bartlett, I. Sutskever, and P. Abbeel,“Rl2: Fast reinforcement learning via slow reinforcement learning,”arXiv preprintarXiv:1611.02779, 2016.
[11] A. Gupta, R. Mendonca, Y. Liu, P. Abbeel, and S. Levine, “Meta-reinforcementlearning of structured exploration strategies,” inAdvances in Neural InformationProcessing Systems, 2018, pp. 5302–5311.
[12] T. Xu, Q. Liu, L. Zhao, and J. Peng, “Learning to explore via meta-policy gradient,”inInternational Conference on Machine Learning, 2018, pp. 5463–5472.
[13] M. Chen, W. Saad, C. Yin, and M. Debbah, “Data correlation-aware resource man-agement in wireless virtual reality (vr): An echo state transfer learning approach,”IEEE Transactions on Communications, vol. 67, no. 6, pp. 4267–4280, 2019.
[14] Y. Zhang, L. Jiao, J. Yan, and X. Lin, “Dynamic service placement for virtual re-ality group gaming on mobile edge cloudlets,”IEEE Journal on Selected Areas inCommunications, vol. 37, no. 8, pp. 1881–1897, 2019.[15] F. Guo, L. Ma, H. Zhang, H. Ji, and X. Li, “Joint load management and resourceallocation in the energy harvesting powered small cell networks with mobile edgecomputing,” inIEEE INFOCOM 2018 - IEEE Conference on Computer Communi-cations Workshops (INFOCOM WKSHPS), 2018, pp. 299–304.
[16] H. Ahmadi, O. Eltobgy, and M. Hefeeda, “Adaptive multicast streaming of virtualreality content to mobile users,” inProceedings of the on Thematic Workshopsof ACM Multimedia 2017, ser. Thematic Workshops ’17.New York, NY, USA:Association for Computing Machinery, 2017, p. 170–178. [Online]. Available:https://doi.org/10.1145/3126686.3126743
[17] J. Yang, J. Luo, D. Meng, and J. Hwang, “Qoe-driven resource allocation optimizedfor uplink delivery of delay-sensitive vr video over cellular network,”IEEE Access,vol. 7, pp. 60 672–60 683, 2019.
[18] X. Yang, Z. Chen, K. Li, Y. Sun, N. Liu, W. Xie, and Y. Zhao, “Communication-constrained mobile edge computing systems for wireless virtual reality: Schedulingand tradeoff,”IEEE Access, vol. 6, pp. 16 665–16 677, 2018.
[19] Y. Mori, N. Fukushima, T. Fujii, and M. Tanimoto, “View generation with 3d warp-ing using depth information for ftv,” in2008 3DTV Conference: The True Vision -Capture, Transmission and Display of 3D Video, May 2008, pp. 229–232.
[20] H. Mao, M. Alizadeh, I. Menache, and S. Kandula, “Resource Management withDeep Reinforcement Learning,” inProceedings of the 15th ACM Workshop on HotTopics in Networks - HotNets ’16. New York, New York, USA: ACM Press, 2016,pp. 50–56.
[21] G. Dulac-Arnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt,T. Mann, T. Weber, T. Degris, and B. Coppin, “Deep reinforcement learning in largediscrete action spaces,”arXiv preprint arXiv:1512.07679, 2015.
[22] M.-T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention-basedneural machine translation,”arXiv preprint arXiv:1508.04025, 2015.
[23] Y.-C. Chen, Y.-T. Lin, and C.-W. Huang, “A hybrid scenario generator and its ap-plication on network simulations,” in2020 IEEE International Conference on Con-sumer Electronics - Taiwan (ICCE-Taiwan) (2020 IEEE ICCE-Taiwan), Taoyuan,Taiwan, Sep. 2020.
[24] Mojang, “Minecraft,” https://www.minecraft.net, 2009.
[25] “Craftbukkit,” https://getbukkit.org, 2009.[26] Xikage,“Mythicmobs,”https://www.spigotmc.org/resources/mythicmobs-free-version-the-1-custom-mob-creator.5702, 2015.
[27] 3GPP TS 23.501, “System Architecture for the 5G System.”
電子全文 電子全文(網際網路公開日期:20220820)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔