跳到主要內容

臺灣博碩士論文加值系統

(98.80.143.34) 您好!臺灣時間:2024/10/07 20:21
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄭文皓
研究生(外文):Cheng, Wen-Hao
論文名稱:基於強化學習以戰術草圖生成籃球對戰模擬
論文名稱(外文):Simulating Basketball Plays from Tactic Sketches by using Reinforcement Learning
指導教授:王昱舜王昱舜引用關係
指導教授(外文):Wang, Yu-Shuen
口試委員:朱宏國胡敏君林奕成
口試委員(外文):Chu, Hung-KuoHu, Min-ChunLin, I-Chen
口試日期:2019-10-30
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:中文
論文頁數:32
中文關鍵詞:籃球戰術強化學習
外文關鍵詞:Basketball StrategiesReinforcement Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:264
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本篇論文介紹基於強化學習方法以籃球戰術草圖生成整個回合的對戰模擬,模擬進攻戰術的執行及防守方可能的對應行為,生成的結果經過視覺化可以讓教練更容易說明和分析戰術,也可以在賽後對比實際戰術執行的結果或檢討場上防守行為的表現。我們訓練兩種模型:進攻及防守模型,讓兩模型在同一環境中互動,模型根據環境所給過去幾秒的狀態作為觀察來做出行動,進攻模型額外根據輸入的戰術草圖當作條件並執行戰術指令。我們使用基於 actor-critic 改善的無模型算法 Proximal Policy Optimization (PPO) 訓練連續動作空間的模型。我們使用沃羅諾伊圖算出球員在球場上佔位,將佔位加總當成獎勵設計出一密集的獎勵函數來解決稀疏獎勵造成的訓練不易問題。我們在獎勵加上移動距離的懲罰項來改進生成的模擬結果。最後我們將模擬結果與真實資料做客觀數據上的比對來評估模型。
This paper presents a method to simulate basketball plays from tactic sketches. It simulates how the offense tactic will be executed as well as how the defenders will react. With the simulated plays, it is easier for coaches to illustrate and evaluate the newly developed tactic. Players can review their plays by comparing to the simulated ones after the game. To achieve the aim, we use reinforcement learning technique to model our problem. We use two agents to represent offense team and defense team, let two agents interact with each other in same environment. The offense agent has tactic instruction extracted from tactic sketch as additional condition. The agents decide actions according to the environment state. The environment takes actions and simulate the play. We use model-free algorithm Proximal Policy Optimization (PPO) to train our continuous action space model. To solve sparse reward problem, we design voronoi reward by considering player’s court space ownership. To improve the quality of simulation, we add an additional moving distance penalty. To evaluate our system, we do quantitative analysis to objectively compare between real and simulated plays.
摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
一、序論 1
二、相關研究 3
2.1 籃球分析 3
2.2 強化學習 4
三、方法 6
3.1 概觀 6
3.2 模型 7
3.2.1 狀態空間 7
3.2.2 動作空間 7
3.2.3 模擬環境 8
3.2.4 沃羅諾伊獎勵 (voronoi reward) 9
3.2.5 防守模型 11
3.2.6 進攻模型 12
3.3 資料 13
3.4 訓練 15
3.4.1 Proximal Policy Optimization (PPO) 15
3.4.2 網路架構 17
3.5 實作細節 18
四、結果與分析 19
4.1 防守模型結果 19
4.2 進攻策略模擬 21
4.3 量化結果 24
4.4 不足與限制 27
五、結語與展望 28
參考文獻 29
[1] Nazanin Mehrasa, et al. Deep Learning of Player Trajectory Representations for Team Activity Analysis. In MIT SLOAN Sports Analytics Conference, 2018
[2] Andrew C. Miller and Luke Bornn. Possession sketches: Mapping nba strategies. In MIT Sloan Sports Analytics Conference, 2017
[3] D Cervone, et al. NBA Court Realty. In MIT SLOAN Sports Analytics Conference, 2016
[4] Daniel Cervone, et al. A Multiresolution Stochastic Process Model for Predicting Basketball Possession Outcomes. Journal Of The American Statistical Association Vol. 111, Iss. 514, 2016
[5] C.Y. Chen, et al. Generating Defensive Plays in Basketball Games. In ACM International Conference on Multimedia, 2018
[6] John Hollinger. Pro Basketball Prospectus. University of Nebraska Press, 2003.
[7] Dean Oliver. Basketball on Paper: Rules and Tools for Performance Analysis. University of Nebraska Press, 2004
[8] Alexander Franks, Andrew Miller, Luke Bornn, and Kirk Goldsberry. 2015. Counterpoints: Advanced defensive metrics for nba basketball. In 9th Annual MIT Sloan Sports Analytics Conference, Boston, MA
[9] Peter Beshai. 2014. Buckets: Basketball Shot Visualization. University of British Columbia, published Dec (2014), 547–14.
[10] Kuan-Chieh Wang and Richard Zemel. 2016. Classifying NBA offensive plays using neural networks. In Proc. of MIT Sloan Sports Analytics Conference.
[11] Ching-Hang Chen, Tyng-Luh Liu, Yu-Shuen Wang, Hung-Kuo Chu, Nick C Tang, and Hong-Yuan Mark Liao. 2015. Spatio-Temporal Learning of Basketball Offensive Strategies. In Proceedings of ACM international conference on Multimedia. 1123–1126.
[12] Andrew C. Miller, Luke Bornn. Possession Sketches: Mapping NBA Strategies. In MIT SLOAN Sports Analytics Conference, 2017
[13] Mark Harmon, Patrick Lucey, and Diego Klabjan. 2016. Predicting Shot Making in Basketball Learnt from Adversarial Multiagent Trajectories. arXiv preprint arXiv:1609.04849 (2016).
[14] Rajiv Shah and Rob Romijnders. 2016. Applying deep learning to basketball trajectories. arXiv preprint arXiv:1608.03793 (2016)
[15] Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin Riedmiller, David Silver. Emergence of Locomotion Behaviours in Rich Environments. arXiv preprint arXiv:1707.02286
[16] Tejas D. Kulkarni, Karthik R. Narasimhan, Ardavan Saeedi, Joshua B. Tenenbaum. 2016. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation. Advances in Neural Information Processing Systems 29 (NIPS 2016)
[17] McDowell MA1, Fryar CD, Ogden CL. 2009. Anthropometric reference data for children and adults: United States, 1988-1994. Vital and Health Statistics. Series 11, Data From the National Health Survey [01 Apr 2009(249):1-68]
[18] Zheng, S., Yue, Y., & Hobbs, J. (2016). Generating Long-term Trajectories Using Deep Hierarchical Networks. In Advances in Neural Information Processing Systems, (pp. 1543–1551)
[19] Thomas Seidl, Aditya Cherukumudi, Andrew Hartnett, Peter Carr, and Patrick Lucey. 2018. Bhostgusters: Realtime Interactive Play Sketching with Synthesized NBA Defenses. (2018).
[20] Hsin-Ying Hsieh, Chieh-Yu Chen, Yu-Shuen Wang, Jung-Hong Chuang. 2019. BasketballGAN: Generating Basketball Play Simulation Through Sketching. arXiv preprint arXiv:1909.07088
[21] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller. 2013. Playing Atari with Deep Reinforcement Learning. In NIPS Deep Learning Workshop 2013.
[22] V. R. Konda and J. N. Tsitsiklis, Actor-Critic Algorithms. In Advances in neural information processing systems, 2000, pp. 1008–1014.
[23] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937, 2016.
[24] John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347
[25] Rachit Dubey, Pulkit Agrawal, Deepak Pathak, Thomas L Griffiths, and Alexei A Efros. 2018. Investigating human priors for playing video games. arXiv preprint arXiv:1802.10217.
[26] Guo, X., Singh, S., Lee, H., Lewis, R. L., and Wang, X. 2014. Deep learning for real-time Atari game play using offline montecarlo tree search planning. In NIPS, pp. 3338–3346.
[27] Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., and Moritz, P. 2015. Trust region policy optimization. In ICML, pp. 1889–1897.
[28] Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. 2015. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, pp. 2728–2736.
[29] Levine, S., Finn, C., Darrell, T., and Abbeel, P. 2015. End-to-end training of deep visuomotor policies. arXiv:1504.00702.
[30] Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T., and Tassa, T. 2015. Learning continuous control policies by stochastic value gradients. In NIPS, pp. 2926–2934.
[31] Zheng, S., Yue, Y., & Hobbs, J. 2016. Generating Long-term Trajectories Using Deep Hierarchical Networks. In Advances in Neural Information Processing Systems, pp. 1543–1551.
[32] Danijar Hafner, James Davidson, Vincent Vanhoucke. 2017. TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow. arXiv preprint arXiv:1709.02878.
[33] Xavier Glorot and Yoshua Bengio. 2010. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics. 249–256.
[34] R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
[35] J. Schulman, S. Levine, P. Moritz, M. I. Jordan, and P. Abbeel. “Trust region policy optimization”. In: CoRR, abs/1502.05477 (2015).
電子全文 電子全文(網際網路公開日期:20241210)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top