跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.170) 您好!臺灣時間:2024/12/06 03:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:王智文
研究生(外文):Wang, Jhin-Wun
論文名稱:Deep Q-network演算法應用於並行機台調度問題
論文名稱(外文):Parallel Machine Scheduling with Deep Q-network Algorithm
指導教授:劉建良劉建良引用關係
指導教授(外文):Liu, Chien-Liang
口試委員:巫木誠陳勝一
口試委員(外文):Wu, Muh-CherngChen, Sheng-I
口試日期:2020-07-17
學位類別:碩士
校院名稱:國立交通大學
系所名稱:工業工程與管理系所
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:40
中文關鍵詞:平行機台調度問題機台負荷平衡強化學習DQN 演算法
外文關鍵詞:parallel machine scheduling problemsmachine load balancingreinforcement learningDQN algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:384
  • 評分評分:
  • 下載下載:31
  • 收藏至我的研究室書目清單書目收藏:0
平行機台調度是個經典的問題,並且在許多產業領域中皆有與平行機台調度相關的問題,近期平行機台調度仍然是一個熱門的研究議題。平行機台調度問題是一個在有限資源以及時間內,能夠完成全部或是最多任務的一個決策過程,所以平行機台調度可以被視為組合最佳化問題,許多研究會使用數學規劃法來求解此類型的問題。然而當使用數學規劃法求解大規模的問題時,此類的問題將會變成相當棘手且複雜,甚至是無法求解。因此近幾年大部分的研究將會使用近似最佳解的方法來解決此類型的問題,像是啟發式演算法、調度規則或是強化學習的方法。然而近幾年深度學習在許多領域中獲得了驚人的成就。平行機台調度問題可被視為馬可夫決策過程,該特性跟深度強化學習相同,促使我們嘗試使用深度強化學習來解決經典的平行機台調度問題。

此研究提出一個利用Deep Q-network (DQN)演算法來解決最小化完工時間的平行機台調度問題,如果所有任務將平均分配給所有機台,此決策結果對平行機台調度問題來說為最佳解,因此我們將機台負荷平衡以及一些有關排程的知識設計成動作以及獎勵,期望能夠讓我們提出的演算法靈活運用在問題上,且從實驗結果中歸納出一些強化學習適合用於平行機台調度問題上的規則,這些規則將有利於未來相關研究使用。我們使用實驗結果來顯示我們提出的演算法是有效率且能夠妥善解決問題,在同一規模的問題下能夠快速地給予有競爭力的可行解。就我們所知,此研究將是第一個單獨使用DQN演算法來求解平行機台調度問題的研究,而不是輔佐其他演算法來解決問題。
The Parallel machine scheduling problem (PMSP) is a problem that is concerned about resource allocation to complete all or the maximal number of tasks in limited time and resources, so it could be viewed as a combinatorial optimization problem. However, PMSP has been recognized as an NP-complete/hard problem, making it to become an intractable problem when confronted with large-scale problems. Thus, other algorithms that could obtain sub-optimal solutions have been devised over decades. These approaches involved meta-heuristic, dispatching rules, and reinforcement learning. Deep reinforcement learning (DRL), combining deep learning and reinforcement learning, has achieved promising results in several domains. Meanwhile, the PMSP, whose processing time is deterministic, can be formulated as a Markov decision process (MDP) problems, inspiring us to use DRL to cope with PMSP. This work proposes an Deep Q-network (DQN) algorithm for minimization of makespan in PMSP. If all tasks are equally assigned to each machine, this scheduling result will be an optimal solution for PMSP. Therefore, we design machine load balancing and some scheduling domain knowledge for action and reward. The goal is to empirically indicate that the proposed algorithm is competitive and effective. Moreover, we will provide in-depth analysis to summarize which designs are suitable for the application of reinforcement learning in PMSP. To the best of our knowledge, this is the first work attempting to use DQN algorithm alone to attack PMSP.
摘要 i
ABSTRACT ii
Contents iii
List of Figures v
List of Tables vi
1 Introduction 1
1.1 Background & Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Related Work 5
2.1 Scheduling Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Parallel Machine Scheduling Problems . . . . . . . . . . . . . . . . . 5
2.1.2 Dispatching Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Deep Q-network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 Proposed Method 15
3.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Design of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3 Training Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Experiments 23
4.1 Data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.1 Comparison Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 25
4.3.2 Training Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3.4 Generalization of The Model . . . . . . . . . . . . . . . . . . . . . . . 27
5 Discussion 29
5.1 Result Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2 Early Stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 The Design of Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Training Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6 Conclusion and Future Work 35
[1] Michael Pinedo. Scheduling. Vol. 29. Springer, 2012.
[2] Jeffrey D. Ullman. “NP-complete scheduling problems”. In: Journal of Computer and System sciences 10.3 (1975), pp. 384–393.
[3] Kenneth R Baker. Introduction to sequencing and scheduling. John Wiley & Sons, 1974.
[4] Eugene L Lawler et al. “Sequencing and scheduling: Algorithms and complexity”. In: Handbooks in operations research and management science 4 (1993), pp. 445–522.
[5] David Silver et al. “Mastering the game of Go with deep neural networks and tree search”. In: nature 529.7587 (2016), p. 484.
[6] Wei Zhang and Thomas G Dietterich. “A reinforcement learning approach to job-shop scheduling”. In: IJCAI. Vol. 95. Citeseer. 1995, pp. 1114–1120.
[7] M Emin Aydin and Ercan Öztemel. “Dynamic job-shop scheduling using reinforcement learning agents”. In: Robotics and Autonomous Systems 33.2-3 (2000), pp. 169–178.
[8] Zhiping Peng et al. “Random task scheduling scheme based on reinforcement learning in cloud computing”. In: Cluster computing 18.4 (2015), pp. 1595–1607.
[9] Matthew E Taylor and Peter Stone. “Transfer learning for reinforcement learning domains: A survey”. In: Journal of Machine Learning Research 10.Jul (2009), pp. 1633– 1685.
[10] Volodymyr Mnih et al. “Playing atari with deep reinforcement learning”. In: arXiv preprint arXiv:1312.5602 (2013).
[11] Manuel J Pereira Lopes and JM Valério de Carvalho. “A branch-and-price algorithm for scheduling parallel machines with sequence dependent setup times”. In: European journal of operational research 176.3 (2007), pp. 1508–1527.
[12] Yves Robert and Frédéric Vivien. Introduction to scheduling. CRC Press, 2009.
[13] Ethel Mokotoff. “Parallel machine scheduling problems: A survey”. In: Asia-Pacific Journal of Operational Research 18.2 (2001), p. 193.
[14] John H Blackstone, Don T Phillips, and Gary L Hogg. “A state-of-the-art survey of dispatching rules for manufacturing job shop operations”. In: The International Journal of Production Research 20.1 (1982), pp. 27–45.
[15] TCE Cheng and CCS Sin. “A state-of-the-art review of parallel-machine scheduling research”. In: European Journal of Operational Research 47.3 (1990), pp. 271–292.
[16] Michael H Rothkopf. “Scheduling independent tasks on parallel processors”. In: Management Science 12.5 (1966), pp. 437–447.
[17] Min Ji et al. “Multitasking parallel-machine scheduling with machine-dependent slack due-window assignment”. In: International Journal of Production Research 57.6 (2019), pp. 1667–1684.
[18] Zhi-Long Chen and Warren B Powell. “Solving parallel machine scheduling problems by column generation”. In: INFORMS Journal on Computing 11.1 (1999), pp. 78–94.
[19] Johannes Bartholomeus Gerardus Frenk and AHG Rinnooy Kan. “The rate of convergence to optimality of the LPT rule”. In: Discrete Applied Mathematics 14.2 (1986), pp. 187–197.
[20] JBG Frenk and AHG Rinnooy Kan. “The asymptotic optimality of the LPT rule”. In: Mathematics of Operations Research 12.2 (1987), pp. 241–254.
[21] Kai Li et al. “Parallel machine scheduling problems in green manufacturing industry”. In: Journal of Manufacturing Systems 38 (2016), pp. 98–106.
[22] Daniel Kowalczyk and Roel Leus. “A branch-and-price algorithm for parallel machine scheduling using ZDDs and generic branching”. In: INFORMS Journal on Computing 30.4 (2018), pp. 768–782.
[23] Javad Behnamian, Mostafa Zandieh, and SMT Fatemi Ghomi. “Parallel-machine scheduling problems with sequence-dependent setup times using an ACO, SA and VNS hybrid algorithm”. In: Expert Systems with Applications 36.6 (2009), pp. 9637–9644.
[24] Jun Pei et al. “Single-machine and parallel-machine serial-batching scheduling problems with position-based learning effect and linear setup time”. In: Annals of Operations Research 272.1-2 (2019), pp. 217–241.
[25] Burak Gökgür, Brahim Hnich, and Selin Özpeynirci. “Parallel machine scheduling with tool loading: a constraint programming approach”. In: International Journal of Production Research 56.16 (2018), pp. 5541–5557.
[26] Jamal Shahrabi, Mohammad Amin Adibi, and Masoud Mahootchi. “A reinforcement learning approach to parameter estimation in dynamic job shop scheduling”. In: Computers & Industrial Engineering 110 (2017), pp. 75–82.
[27] Wei Han, Fang Guo, and Xichao Su. “A Reinforcement Learning Method for a Hybrid Flow-Shop Scheduling Problem”. In: Algorithms 12.11 (2019), p. 222.
[28] Linus Schrage. “Letter to the editor—a proof of the optimality of the shortest remaining processing time discipline”. In: Operations Research 16.3 (1968), pp. 687–690.
[29] Peng Si Ow and Thomas E Morton. “The single machine early/tardy problem”. In: Management science 35.2 (1989), pp. 177–191.
[30] AO Putnam et al. “Updating critical ratio and slack time priority scheduling rules”. In: Production and Inventory Management 12.4 (1971), pp. 51–72.
[31] RW Conway. “Priority dispatching and work in progress inventory in a job workshop”. In: Journal of Industrial Engineering 16 (1965), pp. 123–130.
[32] “Critical ratio (CR) ruleCRITICAL RATIO”. In: Encyclopedia of Production and Manufacturing Management. Ed. by P. M. Swamidass. Boston, MA: Springer US, 2000, pp. 137–137. ISBN: 978-1-4020-0612-8. DOI: 10.1007/1-4020-0612-8_197. URL: https://doi.org/10.1007/1-4020-0612-8_197.
[33] Martin L Puterman. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons, 2014.
[34] Donald E Kirk. Optimal control theory: an introduction. Courier Corporation, 2004.
[35] Christopher JCH Watkins and Peter Dayan. “Q-learning”. In: Machine learning 8.3-4 (1992), pp. 279–292.
[36] Richard S Sutton, Andrew G Barto, et al. Introduction to reinforcement learning. Vol. 135. MIT press Cambridge, 1998.
[37] Yann LeCun et al. “Gradient-based learning applied to document recognition”. In: Proceedings of the IEEE 86.11 (1998), pp. 2278–2324. 39
[38] David Silver et al. “Mastering the game of go without human knowledge”. In: Nature 550.7676 (2017), pp. 354–359.
[39] Timothy P Lillicrap et al. “Continuous control with deep reinforcement learning”. In: arXiv preprint arXiv:1509.02971 (2015).
[40] Carlos Florensa et al. “Reverse curriculum generation for reinforcement learning”. In: arXiv preprint arXiv:1707.05300 (2017).
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top