研究生(外文):Shih-Ping Lin
論文名稱(外文):SGAS-es: Avoiding Performance Collapse by Sequential Greedy Architecture Search with the Early Stopping Indicator
指導教授(外文):Sheng-De Wang
外文關鍵詞:neural architecture searchdifferentiable architecture searchsequential greedy architecture searchflat minimaearly stoppingimage classificationdeep learning
Differentiable Architecture Search (DARTS) is a popular Neural Architecture Search method nowadays. However, many studies pointed out the performance collapse issue in DARTS, leading to unstable searched results. Sequential Greedy Architecture Search (SGAS) is a DARTS-based approach. The main purpose of SGAS is to reduce the discretization loss of DARTS. Nevertheless, we observed that searched results of SGAS are still unstable and may lead to performance collapse as DARTS. We referred to this problem as the cascade performance collapse issue. Therefore, we proposed Sequential Greedy Architecture Search with the Early Stopping Indicator (SGAS-es). We adopted the early stopping mechanism in each phase of SGAS to stabilize searched results and further improve the searching ability of SGAS. The early stopping mechanism is based on the relation among Flat Minima, the largest eigenvalue of the Hessian matrix of the loss function, and performance collapse. We devised a mathematical derivation to show the relation between Flat Minima and the largest eigenvalue. The moving averaged largest eigenvalue is used as an early stopping indicator. Finally, we used NAS-Bench-201 and Fashion-MNIST to confirm the performance and stability of SGAS-es. Moreover, we used EMNIST-Balanced to verify the transferability of searched results. These experiments show that SGAS-es can derive the nearly optimal architecture on NAS-Bench-201. SGAS-es can also derive stable results with good accuracy on Fashion-MNIST and EMNIST-Balanced.
口試委員會審定書 i
誌謝 ii
摘要 iii
Abstract iv
1 Introduction 1
1.1 Background of NAS and DARTS 1
1.2 Problems of DARTS and SGAS 2
1.2.1 Performance Collapse Issue 3
1.2.2 Cascade Performance Collapse Issue 5
1.3 Research Purposes and Main Contributions 5
2 Related Work 7
2.1 Neural Architecture Search (NAS) 7
2.2 Differentiable Architecture Search (DARTS) 8
2.2.1 Encoding: Continuous Relaxation 9
2.2.2 Decoding: Discretization 9
2.2.3 Formulation of Optimization Problem of DARTS 10
2.3 Optimization Gap between NAS and DARTS 10
2.3.1 Mathematical Error 10
2.3.2 Supernet Sub-optima 11
2.3.3 Discretization Loss 11
2.3.4 Deployment Gap 11
2.4 Sequential Greedy Architecture Search (SGAS) 12
2.5 The Relation among Flat Minima, λ^{max}_{α} and Performance Collapse 13
2.5.1 Flat Minima 13
2.5.2 The Role of λ^{max}_{α} in DARTS 15
3 Approach 16
3.1 Search Space 16
3.1.1 DARTS CNN Search Space 16
3.1.2 NAS-Bench-201 Search Space 19
3.2 Bilevel Optimization 20
3.3 Early Stopping Indicator 23
3.4 Edge Decision Epoch 23
3.4.1 Edge Decision Strategy 23
3.4.2 Fixing the Operation 26
3.5 Put-It-All-Together: SGAS-es 27
3.6 The Whole Searching and Retraining Pipe 30
3.6.1 Searching Stage 30
3.6.2 Retraining Stage 31
4 Experiment 32
4.1 NAS-Bench-201 32
4.1.1 NAS-Bench-201 Description 32
4.1.2 Experiment Results 33
4.2 Fashion-MNIST Dataset 34
4.2.1 Dataset Description 34
4.2.2 Experiment Details 34
4.2.3 Experiment Results 35
4.3 EMNIST-Balanced Dataset 37
4.3.1 Dataset Description 37
4.3.2 Experiment Details 37
4.3.3 Experiment Results 38
5 Conclusion 39
Bibliography 40
Appendix 45
A Searched Cells of SGAS-es on NAS-Bench-201 45
B Searched Cells of SGAS-es on Fashion-MNIST 46
