跳到主要內容

臺灣博碩士論文加值系統

(44.220.181.180) 您好!臺灣時間:2024/09/09 18:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林士平
研究生(外文):Shih-Ping Lin
論文名稱:循序貪婪早停架構搜尋:一種避免效能崩潰的可微分架構搜尋方法
論文名稱(外文):SGAS-es: Avoiding Performance Collapse by Sequential Greedy Architecture Search with the Early Stopping Indicator
指導教授:王勝德王勝德引用關係
指導教授(外文):Sheng-De Wang
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:46
中文關鍵詞:神經網路架構搜尋可微分架構搜尋循序貪婪架構搜尋平滑極小值早停機制影像分類深度學習
外文關鍵詞:neural architecture searchdifferentiable architecture searchsequential greedy architecture searchflat minimaearly stoppingimage classificationdeep learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:13
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
可微分架構搜尋(DARTS)是當今主流的神經網路架構搜尋方法之一,然而許多研究指出其有效能崩潰的問題,會造成搜尋結果的不穩定。循序貪婪架構搜尋(SGAS)的主要目的是要減少DARTS的離散化損失,然而我們點出SGAS的搜尋結果依舊不穩定,且可能得到如同DARTS的效能崩潰結果,我們稱此問題為連鎖效能崩潰。為此,我們提出了循序貪婪早停架構搜尋(SGAS-es),藉由將早停機制加入SGAS的各個階段,來穩定其搜尋結果,並加強SGAS的搜尋能力。該早停機制的原理是基於以下三者間的關係:平滑極小值、損失函數的漢森矩陣對應到的最大特徵值,以及效能崩潰。我們提出了數學推導,來說明平滑極小值與最大特徵值的關係。我們使用移動平均最大特徵值作為早停機制的指標。最後我們透過NAS-Bench-201與Fashion-MNIST來測試SGAS-es的效能與穩定性,並透過EMNIST-Balanced來驗證搜尋結果的可轉移性。實驗結果顯示,SGAS-es在NAS-Bench-201上能搜尋到接近最佳的架構,並且在Fashion-MNIST與EMNIST-Balanced均達到穩定且良好的準確度。
Differentiable Architecture Search (DARTS) is a popular Neural Architecture Search method nowadays. However, many studies pointed out the performance collapse issue in DARTS, leading to unstable searched results. Sequential Greedy Architecture Search (SGAS) is a DARTS-based approach. The main purpose of SGAS is to reduce the discretization loss of DARTS. Nevertheless, we observed that searched results of SGAS are still unstable and may lead to performance collapse as DARTS. We referred to this problem as the cascade performance collapse issue. Therefore, we proposed Sequential Greedy Architecture Search with the Early Stopping Indicator (SGAS-es). We adopted the early stopping mechanism in each phase of SGAS to stabilize searched results and further improve the searching ability of SGAS. The early stopping mechanism is based on the relation among Flat Minima, the largest eigenvalue of the Hessian matrix of the loss function, and performance collapse. We devised a mathematical derivation to show the relation between Flat Minima and the largest eigenvalue. The moving averaged largest eigenvalue is used as an early stopping indicator. Finally, we used NAS-Bench-201 and Fashion-MNIST to confirm the performance and stability of SGAS-es. Moreover, we used EMNIST-Balanced to verify the transferability of searched results. These experiments show that SGAS-es can derive the nearly optimal architecture on NAS-Bench-201. SGAS-es can also derive stable results with good accuracy on Fashion-MNIST and EMNIST-Balanced.
口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
1 Introduction 1
1.1 Background of NAS and DARTS 1
1.2 Problems of DARTS and SGAS 2
1.2.1 Performance Collapse Issue 3
1.2.2 Cascade Performance Collapse Issue 5
1.3 Research Purposes and Main Contributions 5
2 Related Work 7
2.1 Neural Architecture Search (NAS) 7
2.2 Differentiable Architecture Search (DARTS) 8
2.2.1 Encoding: Continuous Relaxation 9
2.2.2 Decoding: Discretization 9
2.2.3 Formulation of Optimization Problem of DARTS 10
2.3 Optimization Gap between NAS and DARTS 10
2.3.1 Mathematical Error 10
2.3.2 Supernet Sub-optima 11
2.3.3 Discretization Loss 11
2.3.4 Deployment Gap 11
2.4 Sequential Greedy Architecture Search (SGAS) 12
2.5 The Relation among Flat Minima, λ^{max}_{α} and Performance Collapse 13
2.5.1 Flat Minima 13
2.5.2 The Role of λ^{max}_{α} in DARTS 15
3 Approach 16
3.1 Search Space 16
3.1.1 DARTS CNN Search Space 16
3.1.2 NAS-Bench-201 Search Space 19
3.2 Bilevel Optimization 20
3.3 Early Stopping Indicator 23
3.4 Edge Decision Epoch 23
3.4.1 Edge Decision Strategy 23
3.4.2 Fixing the Operation 26
3.5 Put-It-All-Together: SGAS-es 27
3.6 The Whole Searching and Retraining Pipe 30
3.6.1 Searching Stage 30
3.6.2 Retraining Stage 31
4 Experiment 32
4.1 NAS-Bench-201 32
4.1.1 NAS-Bench-201 Description 32
4.1.2 Experiment Results 33
4.2 Fashion-MNIST Dataset 34
4.2.1 Dataset Description 34
4.2.2 Experiment Details 34
4.2.3 Experiment Results 35
4.3 EMNIST-Balanced Dataset 37
4.3.1 Dataset Description 37
4.3.2 Experiment Details 37
4.3.3 Experiment Results 38
5 Conclusion 39
Bibliography 40
Appendix 45
A Searched Cells of SGAS-es on NAS-Bench-201 45
B Searched Cells of SGAS-es on Fashion-MNIST 46
[1] A. Zela, T. Elsken, T. Saikia, Y. Marrakchi, T. Brox, and F. Hutter, “Understanding and robustifying differentiable architecture search,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=H1gDNyrKDS
[2] G. Li, G. Qian, I. C. Delgadillo, M. Müller, A. Thabet, and B. Ghanem, “Sgas: Sequential greedy architecture search,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[3] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
[4] K. Simonyan and A. Zisserman, “Very deep convolution networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[5] C. Szegedy, W. Liu, Y. Ji, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” Computer Vision and Pattern Recognition (CVPR), 2015.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
[7] S. Xie, R. Girshick, P. Doll´ar, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 1492–1500.
[8] J. Bergstra, D. Yamins, and D. Cox, “Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures,” in Proceedings of the 30th International Conference on Machine Learning, PMLR, 2013, pp. 115–123.
[9] H. Mendoza, A. Klein, M. Feurer, J. T. Springenberg, and F. Hutter, “Towards automatically-tuned neural networks,” in Proceedings of the Workshop on Automatic Machine Learning, PMLR, 2016.
[10] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in International Conference on Learning Representations, ICLR, 2017.
[11] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
[12] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
[13] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv preprint arXiv:1806.09055, 2018.
[14] A. Yang, P. M. Esperança, and F. M. Carlucci, “Nas evaluation is frustratingly hard,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=HygrdpVKvr
[15] L. Xie, X. Chen, K. Bi, L. Wei, Y. Xu, L. Wang, Z. Chen, A. Xiao, J. Chang, X. Zhang, and Q. Tian, “Weight-sharing neural architecture search: A battle to shrink the optimization gap,” ACM Computing Surveys, 2022.
[16] X. Chen, L. Xie, J. Wu, and Q. Tian, “Progressive differentiable architecture search: Bridging the depth gap between search and evaluation,” in Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 1294–1303.
[17] H. Liang, S. Zhang, J. Sun, X. He, W. Huang, K. Zhuang, and Z. Li, “Darts+: Improved differentiable architecture search with early stopping,” arXiv preprint arXiv:1909.06035, 2019.
[18] X. Chen and C.-J. Hsieh, “Stabilizing differentiable architecture search via perturbation-based regularization,” arXiv:2002.05283, 2021.
[19] Y. Xu, L. Xie, X. Zhang, X. Chen, G.-J. Qi, Q. Tian, and H. Xiong, “PC-DARTS: Partial channel connections for memory-efficient architecture search,” in International Conference on Learning Representations, 2020. [Online]. Available: https://openreview.net/forum?id=BJlS634tPr
[20] X. Dong and Y. Yang, “Nas-bench-201: Extending the scope of reproducible neural architecture search,” in International Conference on Learning Representations (ICLR), 2020. [Online]. Available: https://openreview.net/forum?id=HJxyZkBKDr
[21] X. Dong, L. Liu, K. Musial, and B. Gabrys, “NATS-Bench: Benchmarking nas algorithms for architecture topology and size,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021.
[22] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On large-batch training for deep learning: Generalization gap and sharp minima,” arXiv preprint arXiv:1609.04836, 2016.
[23] T. Elsken, J. H. Metzen, and F. Hutter, “Neural architecture search: A survey,” Journal of Machine Learning Research, 2019.
[24] K. Bi, C. Hu, L. Xie, X. Chen, L. Wei, and Q. Tian, “Stabilizing darts with amended gradient estimation on architectural parameters,” arXiv preprint arXiv:1910.11831, 2019.
[25] L. Guilin, Z. Xing, W. Zitong, L. Zhenguo, and Z. Tong, “Stacnas: Towards stable and consistent optimization for differentiable neural architecture search,” 2019.
[26] K. Bi, L. Xie, X. Chen, L. Wei, and Q. Tian, “Gold-nas: Gradual, one-level, differentiable,” arXiv preprint arXiv:2007.03331, 2020.
[27] P. Hou, Y. Jin, and Y. Chen, “Single-darts: Towards stable architecture search,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 373–382.
[28] X. Chu, T. Zhou, B. Zhang, and J. Li, “Fair DARTS: Eliminating Unfair Advantages in Differentiable Architecture Search,” in 16th Europoean Conference On Computer Vision, 2020. [Online]. Available: https://arxiv.org/abs/1911.12126.pdf
[29] A. Noy, N. Nayman, T. Ridnik, N. Zamir, S. Doveh, I. Friedman, R. Giryes, and L. Zelnik, “Asap: Architecture search, anneal and prune,” in International Conference on Artificial Intelligence and Statistics. PMLR, 2020, pp. 493–503.
[30] S. Hochreiter and J. Schmidhuber, “Flat minima,” Neural computation, vol. 9, no. 1, pp. 1–42, 1997.
[31] P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, “Sharpness-aware minimization for efficiently improving generalization,” arXiv preprint arXiv:2010.01412, 2020.
[32] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.
[33] X. Gastaldi, “Shake-shake regularization,” arXiv preprint arXiv:1705.07485, 2017.
[34] Y. Mao, G. Zhong, Y. Wang, and Z. Deng, “Differentiable light-weight architecture search,” in 2021 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2021, pp. 1–6.
[35] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv preprint arXiv:1704.04861, 2017.
[36] F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” arXiv preprint arXiv:1511.07122, 2015.
[37] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proceedings of the 34th International Conference on Machine Learning, 2017, pp. 1126–1135.
[38] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,” 2009.
[39] P. Chrabaszcz, I. Loshchilov, and F. Hutter, “A downsampled variant of imagenet as an alternative to the cifar datasets,” arXiv preprint arXiv:1707.08819, 2017.
[40] H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashion-mnist a novel image dataset for benchmarking machine learning algorithms.
[41] Y. LeCun, C. Cortes, and C. J. Burges. The mnist database of handwritten digits. [Online]. Available: http://yann.lecun.com/exdb/mnist/
[42] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang, “Random erasing data augmentation,” in Proceedings of the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 13 001–13 008.
[43] J. Rajasegaran, V. Jayasundara, S. Jayasekara, H. Jayasekara, S. Seneviratne, and R. Rodrigo, “Deepcaps: Going deeper with capsule networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 10 725–10 733.
[44] A. Nøkland and L. H. Eidnes, “Training neural networks with local error signals,” in International conference on machine learning. PMLR, 2019, pp. 4839–4850.
[45] G. Cohen, S. Afshar, J. Tapson, and A. Van Schaik, “Emnist: Extending mnist to handwritten letters,” in 2017 international joint conference on neural networks (IJCNN). IEEE, 2017, pp. 2921–2926.
[46] P. Jeevan and A. Sethi, “Wavemix: Resource-efficient token mixing for images,” arXiv preprint arXiv:2203.03689, 2022.
[47] H. Kabir, M. Abdar, S. M. J. Jalali, A. Khosravi, A. F. Atiya, S. Nahavandi, and D. Srinivasan, “Spinalnet: Deep neural network with gradual input,” arXiv preprint arXiv:2007.03347, 2020.
[48] V. Jayasundara, S. Jayasekara, H. Jayasekara, J. Rajasegaran, S. Seneviratne, and R. Rodrigo, “Textcaps: Handwritten character recognition with very small datasets,” in 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, 2019, pp. 254–262.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊