跳到主要內容

臺灣博碩士論文加值系統

(44.222.104.206) 您好!臺灣時間:2024/05/28 13:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:翁雨玄
研究生(外文):Yu-Syuan Weng
論文名稱:以加強式學習實現晶圓測試之路徑規劃
論文名稱(外文):Path planning of wafer probe by reinforcement learning
指導教授:黃國勝黃國勝引用關係
指導教授(外文):Kao-Shing Hwang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:電機工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:67
中文關鍵詞:晶圓測試Asynchronous advantage actor-critic演算法卷積神經網路神經網路加強式學習
外文關鍵詞:Wafer probingNeural networkAsynchronous advantage actor-critic algorithmConvolutional neural networkReinforcement learning
相關次數:
  • 被引用被引用:3
  • 點閱點閱:323
  • 評分評分:
  • 下載下載:68
  • 收藏至我的研究室書目清單書目收藏:0
目前業界在進行晶圓測試時,普遍是由測試儀器自動化測試,而在對畸零的晶粒測試時,改用人工測試,如此作法降低了儀器的使用效率。本論文使用加強式學習的方法,將整片晶圓視為環境,讓代理人移動測試探針,透過訓練代理人的方式,使代理人能用最少的行動次數將晶圓上所有的晶粒測試完畢,然而隨著晶粒個數的增長,代理人將會面對到狀態空間過於龐大的問題。為了克服此問題,本論文提出「聚焦」的方法,將環境的狀態從所有晶粒的測試情形,改成只有在探針附近晶粒的測試情形,藉此降低晶粒增長帶來的影響。本論文使用卷積神經網路處理環境狀態,並藉由Asynchronous advantage actor-critic演算法訓練代理人移動探針與聚焦。先讓代理人在較小的環境中訓練,之後在大環境中訓練時,讓神經網路讀取在小環境中學習過的神經網路參數,利用在小環境學習到的經驗,讓代理人一開始就有移動探針的經驗。在空白實驗中,可以看出聚焦方法適用於多種的探針,以及不同的環境大小。而在畸零實驗中,也可看出聚焦方法適用於畸零環境。
At present, industries generally conduct wafer probing with the use of automated test equipment, and switches to manual processing when testing on odd dice. This way of testing reduces the efficiency of the test equipment. This thesis uses the method of reinforcement learning that considers the wafer as the environment and moves the testing probe by an agent. By training the agent, it is capable of testing all the dice on the wafer with the least number of actions. When the number of dice increases, however, the agent will face the problem of the state space becoming massive. In order to mitigate this problem, this thesis proposes a method called “spotlight”. This method changes the state of the environment from the testing situation of all dice to only the ones surrounding the probe in order to reduce the impact caused by the increase of the number of dice. This thesis uses the convolutional neural network to deal with the state of the environment, and the Asynchronous advantage actor-critic algorithm to train the agent to move the probe and spotlight. The agent is first trained in a smaller environment. The neural network will load the network parameters previously trained when training in a bigger environment later on. This will provide it with the experience learned in a smaller environment. In the empty environment testing, it shows that the method of spotlight can be applied to several kinds of probes, and multiple sizes of the environment. In the odd environment testing, it also shows that the method of spotlight can be applied to the odd environment.
論文審定書 i
摘要 iii
Abstract iv
目錄 v
圖次 vii
表次 ix
第一章 導論 1
1-1 研究動機 1
1-2 論文架構 2
第二章 研究背景 3
2-1 晶圓測試 3
2-2 加強式學習 3
2-3 神經網路 7
2-4 卷積神經網路 9
2-5 actor-critic 12
第三章 研究方法 16
3-1 系統架構 16
3-1-1 代理人接收的狀態與行為 16
3-1-2 神經網路的運作 17
3-1-3 神經網路的架構 18
3-1-4 計算累加獎勵 19
3-1-5 神經網路的更新 20
3-2 提出方法 21
3-2-1 限制行為 21
3-2-2 聚焦方法 22
3-2-3 讀取網路參數 25
3-3 實驗環境 26
第四章 模擬實驗 27
4-1 模擬環境介紹 27
4-1-1 神經網路的輸入 29
4-1-2 環境獎勵 30
4-1-3實驗設定 31
4-2 空白環境實驗 33
4-2-1 探針卡1 34
4-2-2 探針卡2 37
4-2-3 探針卡3 39
4-2-4 探針卡2 使用新獎勵 42
4-3 不同大小的空白環境實驗 45
4-4 畸零環境實驗 50
第五章 總結 55
5-1 結論 55
5-2 未來展望 55
參考文獻 56
[1]廖裕評, & 陸瑞強. (2007). 積體電路測試實務. 新北市: 全華圖書.
[2]Sutton, R., & Barto, A. (2018). Reinforcement Learning: An Introduction. Cambridge, MA: The MIT Press.
[3]Silver, D. (2018). UCL Course on RL. Retrieved from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching.html.
[4]Kotsiantis, S. (2007). Supervised Machine Learning: A Review of Classification. Informatica, 31, pp. 249-268.
[5]Ghahramani, Z. (2004). Unsupervised Learning. Retrieved from http://www.inf.ed.ac.uk/teaching/courses/pmr/docs/ul.pdf.
[6]Williams, R. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning (8), pp. 229–256.
[7]Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review (6), pp. 386–408.
[8]Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. Cambridge, MA: The MIT Press.
[9]Krizhevsky, A., Sutskever, I., & Hinton, G. (2012). ImageNet classification with deep convolutional neural networks. NIPS''12 Proceedings of the 25th International Conference on Neural Information Processing Systems, 1, pp. 1097-1105.
[10]Mnih, V., Badia, A., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K. (2016). Asynchronous Methods for Deep Reinforcement Learning. Retrieved from https://arxiv.org/abs/1602.01783.
[11]Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with Deep Reinforcement Learning. Retrieved from https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf.
[12]Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A., Veness, J., Bellemare, M., . . . Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518, pp. 529–533.
[13]Hinton, G. (2012). rmsprop: Divide the gradient by a running average of its recent magnitude. Retrieved from https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
[14]Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., & Kautz, J. (2016). Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU. Retrieved from https://arxiv.org/abs/1611.06256.
[15]Williams, R., & Peng, J. (1991). Function Optimization Using Connectionist Reinforcement Learning Algorithms. Connection Science, 3, pp. 241-268.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊