跳到主要內容

臺灣博碩士論文加值系統

(44.192.48.196) 您好!臺灣時間:2024/06/16 12:08
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李有權
研究生(外文):Lee, Eugene
論文名稱:自主學習:從架構搜索到轉導推理
論文名稱(外文):Autonomous Learning: From Architecture Search to Transductive Inference
指導教授:李鎮宜
指導教授(外文):Lee, Chen-Yi
口試委員:簡仁宗郭峻因王佑曾何建明林風李鎮宜
口試委員(外文):Chien, Jen-TzungGuo, Jiun-InWong, EugeneHo, Jan-MingLin, PhoneLee, Chen-Yi
口試日期:2023-07-25
學位類別:博士
校院名稱:國立陽明交通大學
系所名稱:電子研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:145
中文關鍵詞:神經適應神經架構搜索優化
外文關鍵詞:Neural AdaptationNeural Architecture SearchOptimizationTransductive Inference
相關次數:
  • 被引用被引用:0
  • 點閱點閱:164
  • 評分評分:
  • 下載下載:25
  • 收藏至我的研究室書目清單書目收藏:0
本論文深入探討神經網路的複雜動態,強調優化和適應的主題。進行的研究專注於兩個核心調查領域:神經架構搜索和神經適應。論文的第一部分介紹了NeuralScale,這是一種設計來有效確定深度神經網路中神經元的最優配置的方法。通過利用反覆剪枝方法和一種新的概念「架構下降」,我們成功地在保留並常常增強性能的同時,將架構擴展到各種大小。論文的第二部分介紹了一種轉導元學習器,這是一種在測試或部署期間包含自我監督權重調整的模型。這種機制允許快速適應數據中未預見的分佈變化,提高了學習系統的健壯性和多功能性。研究還探討了Attentive Independent Mechanisms (AIM)模型的應用,該模型在面對新任務時顯示出快速適應能力,這是少擊學習場景中至關重要的特性。本博士論文展示的結果證明了神經適應和優化在創建更靈活和高效的學習系統中的潛力。
This thesis delves into the intricate dynamics of neural networks, emphasizing the themes of optimization and adaptation. The research carried out focuses on two core areas of investigation: neural architecture search and neural adaptation. The first part of this thesis presents NeuralScale, an approach designed to effectively determine the optimal configuration of neurons in deep neural networks. By leveraging iterative pruning methods and a novel concept termed 'architecture descent', we successfully scale architectures across diverse sizes while preserving, and often enhancing, performance. The second part of this thesis introduces a transductive meta-learner, a model that incorporates self-supervised weight adjustment during testing or deployment. This mechanism allows for rapid adaptation to unforeseen distributional changes in the data, enhancing the robustness and versatility of the learning system. The research also explores the application of the Attentive Independent Mechanisms (AIM) model, which showcases rapid adaptation capabilities in the face of new tasks, a characteristic paramount in few-shot learning scenarios. The results showcased in this dissertation underpin the potential of neural adaptation and optimization in creating more flexible and efficient learning systems.
摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Objectives and Questions . . . . . . . . . . . . . . . . . . . . . . . . 2
1.4 Significance of the Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Dissertation Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction to Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 History and Evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.2 Fundamental Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Neural Architecture Search: A Review . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1 Existing Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2 Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Neural Adaptation: A Review . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Need for Neural Adaptation . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Techniques for Neural Adaptation . . . . . . . . . . . . . . . . . . . . 10
2.3.3 Neural Adaptation in Biologically Inspired Models . . . . . . . . . . . 12
2.3.4 Challenges and Future Directions . . . . . . . . . . . . . . . . . . . . 12
2.4 Optimization in Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 13
iii
2.4.1 Role of Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.2 Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Challenges and Future Directions . . . . . . . . . . . . . . . . . . . . 15
2.5 Transductive Inference and its Importance . . . . . . . . . . . . . . . . . . . . 16
2.5.1 Definition and Application of Transductive Inference . . . . . . . . . . 16
2.5.2 Role in Neural Adaptation . . . . . . . . . . . . . . . . . . . . . . . . 17
3 NeuralScale: An Approach for Optimal Configuration . . . . . . . . . . . . . . . 19
3.1 Introduction to NeuralScale . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 Parameter Tracking via Iterative Pruning . . . . . . . . . . . . . . . . 23
3.2.2 Efficient Scaling of Parameters . . . . . . . . . . . . . . . . . . . . . 25
3.2.3 Architecture Descent: A Novel Technique for Model Refinement . . . 29
3.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Analyzing the Significance of Architecture Descent . . . . . . . . . . . 31
3.3.2 Benchmarking NeuralScale . . . . . . . . . . . . . . . . . . . . . . . 32
3.3.3 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4 Transductive Meta-Learner for Rapid Adaptation . . . . . . . . . . . . . . . . . 45
4.1 Introduction to Transductive Meta-Learner . . . . . . . . . . . . . . . . . . . . 45
4.1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.2 Design and Implementation of Transductive Meta-Learner . . . . . . . 46
4.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Network Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Adaptive Meta-Learning Strategy . . . . . . . . . . . . . . . . . . . . 51
4.2.3 Ordinal Regression as an Objective . . . . . . . . . . . . . . . . . . . 55
4.2.4 Test-Time Transductive Inference . . . . . . . . . . . . . . . . . . . . 57
4.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
iv
4.3.1 Dataset and Experimental Configuration . . . . . . . . . . . . . . . . . 57
4.3.2 Evaluation on MAHNOB-HCI and UBFC-rPPG . . . . . . . . . . . . 60
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.4.1 Performance Comparison with Variable Adaptation Steps . . . . . . . . 61
4.4.2 Exploring Joint Adaptation of Feature Extractor and rPPG Estimator . . 62
4.4.3 Visualization of Feature Activation Map Using Various Methods . . . . 65
4.4.4 Real-Time Video Demonstration . . . . . . . . . . . . . . . . . . . . . 65
4.4.5 Application on PPG-Based Smart Wearable Device . . . . . . . . . . . 65
5 Attentive Independent Mechanisms (AIM) . . . . . . . . . . . . . . . . . . . . . . 74
5.1 AIM: A Rapid Adaptation Model . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.1 Meta-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.2 Continual Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.3.3 Cross-Domain Synergies . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.1 Attentive Independent Mechanisms . . . . . . . . . . . . . . . . . . . 80
5.4.2 Few-Shot Learning Using SIB . . . . . . . . . . . . . . . . . . . . . . 84
5.4.3 Continual Learning: Harmonizing Fast and Slow Learning . . . . . . . 86
5.5 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5.1 Few-Shot Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.5.2 Qualitative Study: Activation of AIM . . . . . . . . . . . . . . . . . . 89
5.5.3 Continual Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
5.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
5.6.1 Investigating the Evolution of Attention Weights Throughout Training . 96
5.6.2 Investigating Attention Weights Across All Classes . . . . . . . . . . . 98
5.6.3 Assessing the Impacts of Stochastic Sampling and the Quantity of Active Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
v
5.7 Continual Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.7.1 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.7.2 Activation Analysis of AIM . . . . . . . . . . . . . . . . . . . . . . . 99
6 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.1 Summary of Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
6.2 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
6.2.1 Interpretation of the Study’s Impact . . . . . . . . . . . . . . . . . . . 108
6.2.2 Reflection on the Research Process . . . . . . . . . . . . . . . . . . . 109
6.3 Recommendations for Future Research . . . . . . . . . . . . . . . . . . . . . . 110
6.4 Current Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.4.1 Marketing Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.4.2 Access to Our Application . . . . . . . . . . . . . . . . . . . . . . . . 117
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
[1] A. Gordon, E. Eban, O. Nachum, B. Chen, H. Wu, T.-J. Yang, and E. Choi, “Morphnet:
Fast & simple resource-constrained structure learning of deep networks,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1586–
1595.
[2] S. Herculano-Houzel, “The human brain in numbers: a linearly scaled-up primate brain,”
Frontiers in human neuroscience, vol. 3, p. 31, 2009.
[3] V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,”
in Proceedings of the 27th international conference on machine learning (ICML-10),
2010, pp. 807–814.
[4] P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” arXiv
preprint arXiv:1710.05941, 2017.
[5] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image
recognition,” arXiv preprint arXiv:1409.1556, 2014.
[6] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition, 2016,
pp. 770–778.
[7] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2017, pp. 4700–4708.
[8] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke,
and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
128
[9] A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto,
and H. Adam, “Mobilenets: Efficient convolutional neural networks for mobile vision
applications,” arXiv preprint arXiv:1704.04861, 2017.
[10] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted
residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, 2018, pp. 4510–4520.
[11] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang,
V. Vasudevan et al., “Searching for mobilenetv3,” arXiv preprint arXiv:1905.02244,
2019.
[12] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward
neural networks,” in Proceedings of the thirteenth international conference on artificial
intelligence and statistics, 2010, pp. 249–256.
[13] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing humanlevel performance on imagenet classification,” in Proceedings of the IEEE international
conference on computer vision, 2015, pp. 1026–1034.
[14] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” arXiv
preprint arXiv:1611.01578, 2016.
[15] C. Liu, B. Zoph, M. Neumann, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang,
and K. Murphy, “Progressive neural architecture search,” in Proceedings of the European
Conference on Computer Vision (ECCV), 2018, pp. 19–34.
[16] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Efficient neural architecture
search via parameter sharing,” arXiv preprint arXiv:1802.03268, 2018.
[17] S. Xie, H. Zheng, C. Liu, and L. Lin, “Snas: stochastic neural architecture search,” arXiv
preprint arXiv:1812.09926, 2018.
129
[18] I. Bello, B. Zoph, V. Vasudevan, and Q. V. Le, “Neural optimizer search with reinforcement learning,” in Proceedings of the 34th International Conference on Machine
Learning-Volume 70. JMLR. org, 2017, pp. 459–468.
[19] H. Liu, K. Simonyan, and Y. Yang, “Darts: Differentiable architecture search,” arXiv
preprint arXiv:1806.09055, 2018.
[20] S. Han, J. Pool, J. Tran, and W. Dally, “Learning both weights and connections for efficient neural network,” in Advances in neural information processing systems, 2015, pp.
1135–1143.
[21] H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf, “Pruning filters for efficient
convnets,” arXiv preprint arXiv:1608.08710, 2016.
[22] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a
simple way to prevent neural networks from overfitting,” The journal of machine learning
research, vol. 15, no. 1, pp. 1929–1958, 2014.
[23] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint
arXiv:1412.6980, 2014.
[24] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by
reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[25] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep
neural networks?” Advances in neural information processing systems, vol. 27, 2014.
[26] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le, “Regularized evolution for image classifier architecture search,” in Proceedings of the aaai conference on artificial intelligence,
vol. 33, no. 01, 2019, pp. 4780–4789.
[27] H. Cai, L. Zhu, and S. Han, “Proxylessnas: Direct neural architecture search on target
task and hardware,” arXiv preprint arXiv:1812.00332, 2018.
130
[28] M. Cogswell, F. Ahmed, R. Girshick, L. Zitnick, and D. Batra, “Reducing overfitting in
deep networks by decorrelating representations,” arXiv preprint arXiv:1511.06068, 2015.
[29] Z. Liu, J. Li, Z. Shen, G. Huang, S. Yan, and C. Zhang, “Learning efficient convolutional networks through network slimming,” in Proceedings of the IEEE International
Conference on Computer Vision, 2017, pp. 2736–2744.
[30] W. C. Abraham and A. Robins, “Memory retention–the synaptic stability versus plasticity
dilemma,” Trends in neurosciences, vol. 28, no. 2, pp. 73–78, 2005.
[31] M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks:
The sequential learning problem,” in Psychology of learning and motivation. Elsevier,
1989, vol. 24, pp. 109–165.
[32] R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan, “Measuring catastrophic
forgetting in neural networks,” arXiv preprint arXiv:1708.02072, 2017.
[33] C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation
of deep networks,” in Proceedings of the 34th International Conference on Machine
Learning-Volume 70. JMLR. org, 2017, pp. 1126–1135.
[34] A. Nichol, J. Achiam, and J. Schulman, “On first-order meta-learning algorithms,” arXiv
preprint arXiv:1803.02999, 2018.
[35] J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few-shot learning,” in
Advances in neural information processing systems, 2017, pp. 4077–4087.
[36] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and
I. Polosukhin, “Attention is all you need,” in Advances in neural information processing
systems, 2017, pp. 5998–6008.
[37] A. Makhzani and B. Frey, “K-sparse autoencoders,” arXiv preprint arXiv:1312.5663,
2013.
131
[38] G. Bellec, D. Salaj, A. Subramoney, R. Legenstein, and W. Maass, “Long short-term
memory and learning-to-learn in networks of spiking neurons,” Advances in neural information processing systems, vol. 31, 2018.
[39] H. Markram, W. Gerstner, and P. J. Sjöström, “Spike-timing-dependent plasticity: a comprehensive overview,” Frontiers in synaptic neuroscience, vol. 4, p. 2, 2012.
[40] Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. J. Hwang, and Y. Yang, “Learning to propagate labels: Transductive propagation network for few-shot learning,” arXiv preprint
arXiv:1805.10002, 2018.
[41] S. X. Hu, P. Moreno, Y. Xiao, X. Shen, G. Obozinski, N. Lawrence, and
A. Damianou, “Empirical bayes transductive meta-learning with synthetic gradients,”
in International Conference on Learning Representations (ICLR), 2020. [Online].
Available: https://openreview.net/forum?id=Hkg-xgrYvH
[42] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint
arXiv:1510.00149, 2015.
[43] Z. Liu, M. Sun, T. Zhou, G. Huang, and T. Darrell, “Rethinking the value of network
pruning,” arXiv preprint arXiv:1810.05270, 2018.
[44] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural
networks,” arXiv preprint arXiv:1905.11946, 2019.
[45] Y. LeCun, J. S. Denker, and S. A. Solla, “Optimal brain damage,” in Advances in neural
information processing systems, 1990, pp. 598–605.
[46] B. Hassibi and D. G. Stork, “Second order derivatives for network pruning: Optimal brain
surgeon,” in Advances in neural information processing systems, 1993, pp. 164–171.
132
[47] Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang, “Filter pruning via geometric median for deep
convolutional neural networks acceleration,” in Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, 2019, pp. 4340–4349.
[48] K. Neklyudov, D. Molchanov, A. Ashukha, and D. P. Vetrov, “Structured bayesian pruning via log-normal multiplicative noise,” in Advances in Neural Information Processing
Systems, 2017, pp. 6775–6784.
[49] N. Lee, T. Ajanthan, and P. H. Torr, “Snip: Single-shot network pruning based on connection sensitivity,” arXiv preprint arXiv:1810.02340, 2018.
[50] J.-H. Luo, J. Wu, and W. Lin, “Thinet: A filter level pruning method for deep neural
network compression,” in Proceedings of the IEEE international conference on computer
vision, 2017, pp. 5058–5066.
[51] C. Zhao, B. Ni, J. Zhang, Q. Zhao, W. Zhang, and Q. Tian, “Variational convolutional
neural network pruning,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2019, pp. 2780–2789.
[52] S. Han, J. Pool, S. Narang, H. Mao, S. Tang, E. Elsen, B. Catanzaro, J. Tran, and W. J.
Dally, “Dsd: regularizing deep neural networks with dense-sparse-dense training flow,”
arXiv preprint arXiv:1607.04381, vol. 3, no. 6, 2016.
[53] J. Frankle and M. Carbin, “The lottery ticket hypothesis: Finding sparse, trainable neural
networks,” arXiv preprint arXiv:1803.03635, 2018.
[54] P. Molchanov, A. Mallya, S. Tyree, I. Frosio, and J. Kautz, “Importance estimation for
neural network pruning,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2019, pp. 11 264–11 272.
[55] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural networks for resource efficient transfer learning,” arXiv preprint arXiv:1611.06440,
vol. 3, 2016.
133
[56] M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le, “Mnasnet: Platform-aware neural architecture search for mobile,” in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, 2019, pp. 2820–2828.
[57] G. Bender, P.-J. Kindermans, B. Zoph, V. Vasudevan, and Q. Le, “Understanding
and simplifying one-shot architecture search,” in International Conference on Machine
Learning, 2018, pp. 549–558.
[58] J. Frankle, G. K. Dziugaite, D. M. Roy, and M. Carbin, “The lottery ticket hypothesis at
scale,” arXiv preprint arXiv:1903.01611, 2019.
[59] X. Ding, G. Ding, Y. Guo, J. Han, and C. Yan, “Approximated oracle filter pruning for
destructive cnn width optimization,” arXiv preprint arXiv:1905.04748, 2019.
[60] H. Zhou, J. Lan, R. Liu, and J. Yosinski, “Deconstructing lottery tickets: Zeros, signs,
and the supermask,” arXiv preprint arXiv:1905.01067, 2019.
[61] N. Cohen and A. Shashua, “Inductive bias of deep convolutional networks through pooling geometry,” arXiv preprint arXiv:1605.06743, 2016.
[62] L. Theis, I. Korshunova, A. Tejani, and F. Huszár, “Faster gaze prediction with dense
networks and fisher pruning,” arXiv preprint arXiv:1801.05787, 2018.
[63] P. Molchanov, S. Tyree, T. Karras, T. Aila, and J. Kautz, “Pruning convolutional neural
networks for resource efficient inference,” arXiv preprint arXiv:1611.06440, 2016.
[64] X. Ding, X. Zhou, Y. Guo, J. Han, J. Liu et al., “Global sparse momentum sgd for pruning very deep neural networks,” in Advances in Neural Information Processing Systems,
2019, pp. 6379–6391.
[65] A. Krizhevsky, G. Hinton et al., “Learning multiple layers of features from tiny images,”
Citeseer, Tech. Rep., 2009.
134
[66] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale
hierarchical image database,” in 2009 IEEE conference on computer vision and pattern
recognition. Ieee, 2009, pp. 248–255.
[67] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS Autodiff
Workshop, 2017.
[68] A. Veit, M. J. Wilber, and S. Belongie, “Residual networks behave like ensembles of relatively shallow networks,” in Advances in neural information processing systems, 2016,
pp. 550–558.
[69] W. Verkruysse, L. O. Svaasand, and J. S. Nelson, “Remote plethysmographic imaging
using ambient light.” Optics express, vol. 16, no. 26, pp. 21 434–21 445, 2008.
[70] W. Chen and D. McDuff, “Deepphys: Video-based physiological measurement using convolutional attention networks,” in Proceedings of the European Conference on Computer
Vision (ECCV), 2018, pp. 349–365.
[71] Z. Yu, W. Peng, X. Li, X. Hong, and G. Zhao, “Remote heart rate measurement from
highly compressed facial videos: an end-to-end deep learning solution with video enhancement,” in Proceedings of the IEEE International Conference on Computer Vision,
2019, pp. 151–160.
[72] X. Niu, S. Shan, H. Han, and X. Chen, “Rhythmnet: End-to-end heart rate estimation
from face via spatial-temporal representation,” IEEE Transactions on Image Processing,
2019.
[73] Y. Bengio, T. Deleu, N. Rahaman, R. Ke, S. Lachapelle, O. Bilaniuk, A. Goyal, and
C. Pal, “A meta-transfer objective for learning to disentangle causal mechanisms,” arXiv
preprint arXiv:1901.10912, 2019.
135
[74] C. Finn, A. Rajeswaran, S. Kakade, and S. Levine, “Online meta-learning,” arXiv preprint
arXiv:1902.08438, 2019.
[75] A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell,
“Meta-learning with latent embedding optimization,” arXiv preprint arXiv:1807.05960,
2018.
[76] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012,
pp. 1097–1105.
[77] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation,
vol. 9, no. 8, pp. 1735–1780, 1997.
[78] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European conference on computer vision. Springer, 2016, pp. 483–499.
[79] P. Digiglio, R. Li, W. Wang, and T. Pan, “Microflotronic arterial tonometry for continuous wearable non-invasive hemodynamic monitoring,” Annals of biomedical engineering, vol. 42, no. 11, pp. 2278–2288, 2014.
[80] S. Liang, Y. Li, and R. Srikant, “Enhancing the reliability of out-of-distribution image
detection in neural networks,” arXiv preprint arXiv:1706.02690, 2017.
[81] J. Ren, P. J. Liu, E. Fertig, J. Snoek, R. Poplin, M. Depristo, J. Dillon, and B. Lakshminarayanan, “Likelihood ratios for out-of-distribution detection,” in Advances in Neural
Information Processing Systems, 2019, pp. 14 680–14 691.
[82] Z. Niu, M. Zhou, L. Wang, X. Gao, and G. Hua, “Ordinal regression with multiple output
cnn for age estimation,” in Proceedings of the IEEE conference on computer vision and
pattern recognition, 2016, pp. 4920–4928.
[83] W. Cao, V. Mirjalili, and S. Raschka, “Rank-consistent ordinal regression for neural networks,” arXiv preprint arXiv:1901.07884, 2019.
136
[84] O. M. Doyle, E. Westman, A. F. Marquand, P. Mecocci, B. Vellas, M. Tsolaki,
I. Kłoszewska, H. Soininen, S. Lovestone, S. C. Williams et al., “Predicting progression
of alzheimer’s disease using ordinal regression,” PloS one, vol. 9, no. 8, 2014.
[85] R. K. Weersma, P. C. Stokkers, A. A. van Bodegraven, R. A. van Hogezand, H. W.
Verspaget, D. J. de Jong, C. Van Der Woude, B. Oldenburg, R. Linskens, E. Festen et al.,
“Molecular prediction of disease risk and severity in a large dutch crohn’s disease cohort,” Gut, vol. 58, no. 3, pp. 388–395, 2009.
[86] J. Y. Streifler, M. Eliasziw, O. R. Benavente, V. C. Hachinski, A. J. Fox, and H. Barnett, “Lack of relationship between leukoaraiosis and carotid artery disease,” Archives of
neurology, vol. 52, no. 1, pp. 21–24, 1995.
[87] M. K. Sigrist, M. W. Taal, P. Bungay, and C. W. McIntyre, “Progressive vascular calcification over 2 years is associated with arterial stiffening and increased mortality in patients
with stages 4 and 5 chronic kidney disease,” Clinical Journal of the American Society of
Nephrology, vol. 2, no. 6, pp. 1241–1248, 2007.
[88] R. Rettie, U. Grandcolas, and B. Deakins, “Text message advertising: Response rates and
branding effects,” Journal of targeting, measurement and analysis for marketing, vol. 13,
no. 4, pp. 304–312, 2005.
[89] D. Parra, A. Karatzoglou, X. Amatriain, and I. Yavuz, “Implicit feedback recommendation via implicit-to-explicit ordinal logistic regression mapping,” Proceedings of the
CARS-2011, vol. 5, 2011.
[90] M. Soleymani, J. Lichtenauer, T. Pun, and M. Pantic, “A multimodal database for affect
recognition and implicit tagging,” IEEE transactions on affective computing, vol. 3, no. 1,
pp. 42–55, 2011.
[91] S. Bobbia, R. Macwan, Y. Benezeth, A. Mansouri, and J. Dubois, “Unsupervised skin
tissue segmentation for remote photoplethysmography,” Pattern Recognition Letters, vol.
124, pp. 82–90, 2019.
137
[92] X. Niu, H. Han, S. Shan, and X. Chen, “Synrhythm: Learning a deep heart rate estimator
from general to specific,” in 2018 24th International Conference on Pattern Recognition
(ICPR). IEEE, 2018, pp. 3580–3585.
[93] R. Špetlík, V. Franc, and J. Matas, “Visual heart rate estimation with convolutional neural network,” in Proceedings of the British Machine Vision Conference, Newcastle, UK,
2018, pp. 3–6.
[94] Z. Yu, X. Li, and G. Zhao, “Remote photoplethysmograph signal measurement from facial
videos using spatio-temporal networks,” in Proc. BMVC, 2019, pp. 1–12.
[95] E. Lee, T.-J. Hsu, and C.-Y. Lee, “Centralized state sensing using sensor array on wearable
device,” in 2019 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE,
2019, pp. 1–5.
[96] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in
2005 IEEE computer society conference on computer vision and pattern recognition
(CVPR’05), vol. 1. IEEE, 2005, pp. 886–893.
[97] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proceedings of the IEEE conference on computer vision and pattern
recognition, 2014, pp. 1867–1874.
[98] G. Bradski, “The OpenCV Library,” Dr. Dobb’s Journal of Software Tools, 2000.
[99] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen,
Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito,
M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, high-performance deep learning library,” in
Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle,
A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: http://papers.neurips.cc/paper/
9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf
138
[100] X. Li, I. Alikhani, J. Shi, T. Seppanen, J. Junttila, K. Majamaa-Voltti, M. Tulppo, and
G. Zhao, “The obf database: A large face video database for remote physiological signal
measurement and atrial fibrillation detection,” in 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018). IEEE, 2018, pp. 242–249.
[101] M.-Z. Poh, D. J. McDuff, and R. W. Picard, “Advancements in noncontact, multiparameter physiological measurements using a webcam,” IEEE transactions on biomedical
engineering, vol. 58, no. 1, pp. 7–11, 2010.
[102] G. De Haan and V. Jeanne, “Robust pulse rate from chrominance-based rppg,” IEEE
Transactions on Biomedical Engineering, vol. 60, no. 10, pp. 2878–2886, 2013.
[103] X. Li, J. Chen, G. Zhao, and M. Pietikainen, “Remote heart rate measurement from face
videos under realistic situations,” in Proceedings of the IEEE conference on computer
vision and pattern recognition, 2014, pp. 4264–4271.
[104] S. Tulyakov, X. Alameda-Pineda, E. Ricci, L. Yin, J. F. Cohn, and N. Sebe, “Self-adaptive
matrix completion for heart rate estimation from face videos under realistic conditions,”
in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,
2016, pp. 2396–2404.
[105] W. Wang, A. C. den Brinker, S. Stuijk, and G. de Haan, “Algorithmic principles of remote
ppg,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 7, pp. 1479–1491,
2016.
[106] F. Bousefsaf, A. Pruski, and C. Maaoui, “3d convolutional neural networks for remote
pulse rate measurement and mapping from facial video,” Applied Sciences, vol. 9, no. 20,
p. 4364, 2019.
[107] M. Menikdiwela, C. Nguyen, H. Li, and M. Shaw, “Cnn-based small object detection
and visualization with feature activation mapping,” in 2017 International Conference on
Image and Vision Computing New Zealand (IVCNZ). IEEE, 2017, pp. 1–5.
139
[108] C. Yang, Y. Dong, Y. Chen, and N. Tavassolian, “A smartphone-only pulse transit
time monitor based on cardio-mechanical and photoplethysmography modalities,” IEEE
Transactions on Biomedical Circuits and Systems, vol. 13, no. 6, pp. 1462–1470, 2019.
[109] P. A. Gutierrez, M. Perez-Ortiz, J. Sanchez-Monedero, F. Fernandez-Navarro, and
C. Hervas-Martinez, “Ordinal regression methods: survey and experimental study,” IEEE
Transactions on Knowledge and Data Engineering, vol. 28, no. 1, pp. 127–146, 2015.
[110] F. Fernandez-Navarro, P. Campoy-Munoz, C. Hervás-Martínez, X. Yao et al., “Addressing the eu sovereign ratings using an ordinal regression approach,” IEEE transactions on
cybernetics, vol. 43, no. 6, pp. 2228–2240, 2013.
[111] S. Feng, C. Lang, J. Feng, T. Wang, and J. Luo, “Human facial age estimation by costsensitive label ranking and trace norm regularization,” IEEE Transactions on Multimedia,
vol. 19, no. 1, pp. 136–148, 2016.
[112] K.-Y. Chang and C.-S. Chen, “A learning framework for age rank estimation based on
face images with scattering transform,” IEEE Transactions on Image Processing, vol. 24,
no. 3, pp. 785–798, 2015.
[113] M. Pérez-Ortiz, M. Cruz-Ramírez, M. D. Ayllón-Terán, N. Heaton, R. Ciria, and
C. Hervás-Martínez, “An organ allocation system for liver transplantation based on ordinal regression,” Applied Soft Computing, vol. 14, pp. 88–98, 2014.
[114] A. Temko, “PPG-based heart rate estimation using Wiener filter, phase vocoder and
Viterbi decoding,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE
International Conference on, 2017, pp. 1013–1017.
[115] Z. Zhang, “Photoplethysmography-based heart rate monitoring in physical activities via
joint sparse spectrum reconstruction,” IEEE transactions on biomedical engineering,
vol. 62, no. 8, pp. 1902–1910, 2015.
140
[116] S. Salehizadeh, D. Dao, J. Bolkhovsky, C. Cho, Y. Mendelson, and K. H. Chon, “A novel
time-varying spectral filtering algorithm for reconstruction of motion artifact corrupted
heart rate signals during intense physical activities using a wearable photoplethysmogram
sensor,” Sensors, vol. 16, no. 1, p. 10, 2016.
[117] B. Sun and Z. Zhang, “Photoplethysmography-based heart rate monitoring using asymmetric least squares spectrum subtraction and bayesian decision theory,” IEEE Sensors
Journal, vol. 15, no. 12, pp. 7161–7168, 2015.
[118] P. Ahmmed, J. Dieffenderfer, J. M. Valero-Sarmiento, V. R. Pamula, N. Van Helleputte,
C. Van Hoof, M. Verhelst, and A. Bozkurt, “A wearable wrist-band with compressive
sensing based ultra-low power photoplethysmography readout circuit,” in 2019 IEEE
16th International Conference on Wearable and Implantable Body Sensor Networks
(BSN). IEEE, 2019, pp. 1–4.
[119] J. Fagot and R. G. Cook, “Evidence for large long-term memory capacities in baboons
and pigeons and its implications for learning and the evolution of cognition,” Proceedings
of the National Academy of Sciences, vol. 103, no. 46, pp. 17 564–17 567, 2006.
[120] A. J. Bauer and M. A. Just, “Monitoring the growth of the neural representations of new
animal concepts,” Human brain mapping, vol. 36, no. 8, pp. 3213–3226, 2015.
[121] D. Zeithamova, M. L. Mack, K. Braunlich, T. Davis, C. A. Seger, M. T. van Kesteren,
and A. Wutz, “Brain mechanisms of concept learning,” Journal of Neuroscience, vol. 39,
no. 42, pp. 8259–8266, 2019.
[122] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan,
P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” arXiv
preprint arXiv:2005.14165, 2020.
[123] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever, “Language models
are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
141
[124] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
[125] E. Lee and C.-Y. Lee, “Neuralscale: Efficient scaling of neurons for resource-constrained
deep neural networks,” in Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2020, pp. 1478–1487.
[126] S. Gidaris, A. Bursuc, N. Komodakis, P. Pérez, and M. Cord, “Boosting few-shot visual
learning with self-supervision,” in Proceedings of the IEEE International Conference on
Computer Vision, 2019, pp. 8059–8068.
[127] K. Javed and M. White, “Meta-learning representations for continual learning,” in Advances in Neural Information Processing Systems, 2019, pp. 1820–1830.
[128] L. M. Zintgraf, K. Shiarlis, V. Kurin, K. Hofmann, and S. Whiteson, “Fast context adaptation via meta-learning,” arXiv preprint arXiv:1810.03642, 2018.
[129] S. Beaulieu, L. Frati, T. Miconi, J. Lehman, K. O. Stanley, J. Clune, and N. Cheney,
“Learning to continually learn,” arXiv preprint arXiv:2002.09571, 2020.
[130] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel, “A simple neural attentive metalearner,” arXiv preprint arXiv:1707.03141, 2017.
[131] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap, “Meta-learning with
memory-augmented neural networks,” in International conference on machine learning,
2016, pp. 1842–1850.
[132] T. Munkhdalai and H. Yu, “Meta networks,” in Proceedings of the 34th International
Conference on Machine Learning-Volume 70. JMLR. org, 2017, pp. 2554–2563.
[133] G. Koch, R. Zemel, and R. Salakhutdinov, “Siamese neural networks for one-shot image
recognition,” in ICML deep learning workshop, vol. 2. Lille, 2015.
[134] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra et al., “Matching networks for one shot
learning,” in Advances in neural information processing systems, 2016, pp. 3630–3638.
142
[135] V. Garcia and J. Bruna, “Few-shot learning with graph neural networks,” arXiv preprint
arXiv:1711.04043, 2017.
[136] S. Ravi and H. Larochelle, “Optimization as a model for few-shot learning,” 2016.
[137] S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018,
pp. 4367–4375.
[138] G. E. Hinton and D. C. Plaut, “Using fast weights to deblur old memories,” in Proceedings
of the ninth annual conference of the Cognitive Science Society, 1987, pp. 177–186.
[139] D. Abati, J. Tomczak, T. Blankevoort, S. Calderara, R. Cucchiara, and B. E. Bejnordi,
“Conditional channel gated networks for task-aware continual learning,” in Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp.
3931–3940.
[140] R. Aljundi, E. Belilovsky, T. Tuytelaars, L. Charlin, M. Caccia, M. Lin, and L. PageCaccia, “Online continual learning with maximal interfered retrieval,” in Advances in
Neural Information Processing Systems, 2019, pp. 11 849–11 860.
[141] R. Hou, H. Chang, M. Bingpeng, S. Shan, and X. Chen, “Cross attention network for
few-shot classification,” in Advances in Neural Information Processing Systems, 2019,
pp. 4005–4016.
[142] K.-H. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text
matching,” in Proceedings of the European Conference on Computer Vision (ECCV),
2018, pp. 201–216.
[143] J. Lee, Y. Lee, J. Kim, A. Kosiorek, S. Choi, and Y. W. Teh, “Set transformer: A framework for attention-based permutation-invariant neural networks,” in International Conference on Machine Learning. PMLR, 2019, pp. 3744–3753.
143
[144] A. Goyal, A. Lamb, J. Hoffmann, S. Sodhani, S. Levine, Y. Bengio, and B. Schölkopf,
“Recurrent independent mechanisms,” arXiv preprint arXiv:1909.10893, 2019.
[145] C.-C. Chang and C.-J. Lin, “Libsvm: A library for support vector machines,” ACM transactions on intelligent systems and technology (TIST), vol. 2, no. 3, pp. 1–27, 2011.
[146] R. Desimone and J. Duncan, “Neural mechanisms of selective visual attention,” Annual
review of neuroscience, vol. 18, no. 1, pp. 193–222, 1995.
[147] K. B. Korb, L. R. Hope, A. E. Nicholson, and K. Axnick, “Varieties of causal intervention,” in Pacific Rim International Conference on Artificial Intelligence. Springer, 2004,
pp. 322–331.
[148] M. Jaderberg, W. M. Czarnecki, S. Osindero, O. Vinyals, A. Graves, D. Silver, and
K. Kavukcuoglu, “Decoupled neural interfaces using synthetic gradients,” in Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR.
org, 2017, pp. 1627–1635.
[149] L. Bertinetto, J. F. Henriques, P. H. Torr, and A. Vedaldi, “Meta-learning with differentiable closed-form solvers,” arXiv preprint arXiv:1805.08136, 2018.
[150] S. Qiao, C. Liu, W. Shen, and A. L. Yuille, “Few-shot image recognition by predicting
parameters from activations,” in Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2018, pp. 7229–7238.
[151] S. Zagoruyko and N. Komodakis, “Wide residual networks,” arXiv preprint
arXiv:1605.07146, 2016.
[152] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales, “Learning to
compare: Relation network for few-shot learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 1199–1208.
144
[153] B. Oreshkin, P. R. López, and A. Lacoste, “Tadam: Task dependent adaptive metric for
improved few-shot learning,” in Advances in Neural Information Processing Systems,
2018, pp. 721–731.
[154] H. Li, D. Eigen, S. Dodge, M. Zeiler, and X. Wang, “Finding task-relevant features for
few-shot learning by category traversal,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 1–10.
[155] B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum, “Human-level concept learning
through probabilistic program induction,” Science, vol. 350, no. 6266, pp. 1332–1338,
2015.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top