(3.238.96.184) 您好!臺灣時間:2021/05/15 06:02
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:孟繁淳
研究生(外文):Fan-Chun Meng
論文名稱:運用DDPG與PPO深度強化學習於資產配置最佳化的研究
論文名稱(外文):Using Deep Reinforcement Learning on Asset Allocation Optimization via Deep Deterministic Policy Gradient and Proximal Policy Optimization
指導教授:許智誠許智誠引用關係
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:106
中文關鍵詞:深度強化學習資產配置最佳化CNNDDPGPPO
外文關鍵詞:Deep Reinforcement LearningAsset Allocation OptimizationCNNDDPGPPO
相關次數:
  • 被引用被引用:0
  • 點閱點閱:76
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
資產配置在金融投資領域有著重要的地位,通過合理而有效的資產配置策略,投資者可以在不同的金融商品間進行資金比例的調節,在抑制風險的同時最大化回報。在傳統的資產配置研究中,常使用 Markowitz 所提出的均值方差模型(Mean-Variance Model),然而其在處理金融產品的時間序列數據時,缺乏足夠的非線性表現能力,且不善於處理金融市場的動態性。

近年來深度強化學習興起,開始有研究者利用深度強化學習來處理資產配置問題,在目前的相關研究中,針對模型內的獎勵函數,多為單純計算投資報酬的變化量,皆尚未考慮風險的因素,然而風險是資產配置策略所需要考慮的重要面相,其在實務上也有一定的重要性。

本研究使用深度強化學習模型來研究資產配置最佳化問題,主要分為兩大階段,第一階段為深度強化學習模型內 CNN 神經網路參數調整,第二階段為比較七組獎勵函數與四種重新平衡頻率,對模型交易績效的影響。

第一階段,本研究以 CNN 分別搭配 DDPG 以及 PPO 演算法設計兩模型,透過各項參數組合之測試,探討深度強化學習內的 CNN 神經網路參數如何影響模型的交易績效。

第二階段,將以實驗後得到之最佳參數組合,透過七組獎勵函數與四種重新平衡頻率之測試,比較模型的交易績效,找尋更適用於資產配置之獎勵函數因子。

本研究之權重配置模型(DDPG 與 PPO 模型)因為需要調整的參數較多,所以將採取局部最佳化的方式進行。雖然獎勵函數與重新平衡頻率也有可能會影響 CNN 神經網路最佳參數的選擇,然而如果每組獎勵函數與重新平衡頻率,皆進行 CNN 神經網路參數的調整,將會使整體實驗的數量龐大許多。因此為了簡化實驗過程,本研究將透過上述所提之階段式的方式來進行實驗。

本研究發現兩模型皆較適合淺層的卷積層,而DDPG模型在卷積層中適合使用較多的卷積核,PPO模型在卷積層中適合使用較少的卷積核,另外,兩模型皆較適合淺層的全連接層與較少的神經元。

本研究將平均投資報酬變化率、波動性、夏普率、最大回撤與年均複合增長率作為獎勵函數的因子,並比較了模型的交易績效,發現DDPG與PPO模型內最適當的獎勵函數皆為平均投資報酬變化率。此外,本研究也發現儘管使用了最適當的獎勵函數,但若DDPG與PPO模型內的CNN神經網路參數組合不適當,依然無法獲得良好的交易績效。因此本研究評斷CNN神經網路參數與獎勵函數,皆對深度強化學習模型的交易績效有重要的影響力。

而波動性、夏普率、最大回撤與年均複合增長率,雖然在衡量整體交易策略績效與風險時是重要且適當的指標,但作為獎勵函數時無法使深度強化學習模型獲得良好的學習。因此以深度強化學習模型來進行資產配置最佳化時,獎勵函數考慮的因子必須審慎考量,並透過合理的實驗仔細尋找。
Asset allocation is an important issue in the field of financial investment. Through a reasonable and effective asset allocation strategy, investors can adjust the proportion of funds between different financial commodities, maximizing returns while suppressing risks. In traditional asset allocation research, the Mean-Variance Model proposed by Markowitz is often used. However, when dealing with time series data of financial products, it lacks sufficient nonlinear performance and is not good at handling the dynamics of financial markets.

In recent years, deep reinforcement learning has emerged, and some researchers have begun to use deep reinforcement learning to deal with asset allocation issues. In the current related research, the reward function in the model is mostly simply calculated the amount of change in investment returns, and the risk has not been considered. However, risk is an important aspect that needs to be considered in asset allocation strategies.

This study uses a deep reinforcement learning model to study the optimization of asset allocation, which is mainly divided into two major stages. The first stage is the CNN neural network parameter adjustment in the deep reinforcement learning model, and the second stage is to compare the seven sets of reward functions and four The impact of this rebalancing frequency on model transaction performance.

In the first stage, this study designed two models with CNN and DDPG and PPO algorithms respectively. Through the test of various parameter combinations, we discussed how the parameters of CNN neural network in deep reinforcement learning affect the transaction performance of the model.

In the second stage, the best parameter combination obtained after the experiment is used to compare the transaction performance of the model through seven sets of reward functions and four rebalance frequency tests to find a reward function factor that is more suitable for asset allocation.

The DDPG and PPO models in this study need to be adjusted because of the large number of parameters that need to be adjusted. Although the reward function and rebalance frequency may also affect the selection of the best parameters of the CNN neural network, if each group of reward function and rebalance frequency are adjusted by the CNN neural network parameters, the overall number of experiments will be huge a lot of. Therefore, in order to simplify the experimental process, this study will carry out the experiment through the above-mentioned phased approach.

This study found that both models are more suitable for shallow convolutional layers, while the DDPG model is suitable for using more convolution kernels in the convolutional layer, and the PPO model is suitable for using fewer convolution kernels in the convolutional layer. In addition, both models It is more suitable for shallow fully connected layers and fewer neurons.

In this study, the average rate of return on investment returns, volatility, Sharpe rate, maximum drawdown and average annual compound growth rate are used as the factors of the reward function, and the transaction performance of the model is compared. The most appropriate reward functions in the DDPG and PPO models are found. It is the rate of change of the average investment return. In addition, this study also found that despite the use of the most appropriate reward function, if the combination of CNN neural network parameters in the DDPG and PPO models is not appropriate, good transaction performance cannot be obtained. Therefore, this study judges the CNN neural network parameters and reward function, which have an important influence on the transaction performance of the deep reinforcement learning model.

While volatility, Sharpe rate, max drawdown and compound annual growth rate are important and appropriate indicators when measuring the overall trading strategy performance and risk, they cannot make deep reinforcement learning models obtain good learning when used as a reward function. Therefore, when using deep reinforcement learning models to optimize asset allocation, the factors considered by the reward function must be carefully considered and carefully searched through reasonable experiments.
摘要 i
Abstract iii
致謝 v
目錄 vi
圖目錄 viii
表目錄 ix
第一章、 緒論 1
1.1 研究背景 1
1.2 研究動機 2
1.3 研究目的 3
第二章、 文獻探討 4
2.1 交易策略績效評估 4
2.2 資產配置 5
2.3 資金配置法 5
2.4 神經網路 6
2.5 深度學習 6
2.5.1 卷積神經網路 7
2.5.2 激勵函數 8
2.6 強化學習 9
2.6.1 基於價值的深度強化學習 10
2.6.2 基於策略的深度強化學習 10
2.6.3 Actor-Critic 11
第三章、 系統設計與實作 12
3.1 系統流程與架構 12
3.2 資產配置模型設計與流程 13
3.2.1 實驗交易商品的資料集 15
3.2.2 買入持有交易策略設計 16
3.2.3 權重配置模型設計與流程 17
3.3 獎勵函數因子與重新平衡頻率之調整 23
3.4 資產配置模型之整體評估 25
第四章、 系統驗證與結果 26
4.1 系統流程與驗證 26
4.2 調整權重配置模型參數與元件 (丁) 31
4.2.1 調整DDPG模型之卷積層與卷積核 (丁-CL1) 31
4.2.2 調整DDPG模型之全連接層與神經元 (丁-FL1) 35
4.2.3 調整PPO模型之卷積層與卷積核 (丁-CL1) 38
4.2.4 調整PPO模型之全連接層與神經元 (丁-FL1) 41
4.3 調整權重配置模型獎勵函數與重新平衡頻率 (戊) 44
4.3.1 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF1) 45
4.3.2 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF2) 47
4.3.3 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF3) 50
4.3.4 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF4) 53
4.3.5 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF5) 57
4.3.6 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF6) 61
4.3.7 調整權重配置模型獎勵函數與重新平衡頻率(戊-獎勵函數RF7) 65
4.3.8 權重配置模型之獎勵函數與重新平衡頻率的整體評估 (戊) 70
4.4 調整平均投資報酬變化率之週期 (戊) 72
4.4.1 調整平均投資報酬變化率之週期(戊-DDPG模型) 74
4.4.2 調整平均投資報酬變化率之週期(戊-PPO模型) 75
4.5 驗證與評估模型有效性 (己) 76
4.5.1 驗證與評估模型有效性(戊-DDPG模型,台灣50) 77
4.5.2 驗證與評估模型有效性(戊-DDPG模型,中型100) 79
4.5.3 驗證與評估模型有效性(戊-PPO模型,台灣50) 81
4.5.4 驗證與評估模型有效性(戊-PPO模型,中型100) 83
4.5.5 驗證與評估模型有效性之整體評估 85
第五章、 結論 86
5.1 結論 86
5.2 研究限制 88
5.3 未來展望 88
參考文獻 89
[1]. Markowitz, H. M., 1968, Portfolio selection: efficient diversification of
investments, Yale university press, Vol. 16
[2]. Akita, R., 2016, Deep learning for stock prediction using numerical and textual information, IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS), pp. 1-6
[3]. Jiang, Z., 2017, Cryptocurrency Portfolio Management with Deep Reinforcement Learning, from https://arxiv.org/abs/1612.01277
[4]. Liang, Z., 2018, Adversarial Deep Reinforcement Learning in Portfolio Management, from https://arxiv.org/abs/1808.09940
[5]. Narang, R. K., 2009, Inside the Black Box: The Simple Truth about Quantitative Trading. Description Based on Print Version Record: J. Wiley et Sons.
[6]. Xiao, L., 2017, A Secure Mobile Crowdsensing Game with Deep Reinforcement Learning, IEEE Transactions on Information Forensics and Security, Vol.13, pp.35-47
[7]. Liu, Z., 2019, Towards Understanding Chinese Checkers with Heuristics, Monte Carlo Tree Search, and Deep Reinforcement Learning, from https://arxiv.org/abs/1903.01747
[8]. Sutton, R., and Barto, A., 2014, Reinforcement Learning, Reinforcement Learning: An Introduction, pp.2-5
[9]. William F., 1992, Asset Allocation: Management Style and Performance Measurment, Journal of Portfolio Management, pp. 7-19
[10]. Meucci, A., 2009, Risk and asset allocation
[11]. Chollet, F., 2017, The Fundamental of Deep Learning, Deep Learning with Python.
[12]. Cortes, C., 2012, L2 Regularization for Learning Kernels, from https://arxiv.org/abs/1205.2653
[13]. Mahmood, H., 2019, Gradient Descent, from https://towardsdatascience.com/gradient-descent-3a7db7520711
[14]. Pai, A., 2020, Analyzing 3 Types of Neural Networks in Deep Learning, from https://www.analyticsvidhya.com/blog/2020/02/cnn-vs-rnn-vs-mlp-analyzing-3-types-of-neural-networks-in-deep-learning
[15]. Zhan, R., 2017, CS221 Project Final Report Deep Reinforcement Learning in Portfolio Management, from https://pdfs.semanticscholar.org/ec54/b8edf44070bc3166084f59ac9372176d7d86.pdf
[16]. Saha, S., 2018, A Comprehensive Guide to Convolutional Neural Networks, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
[17]. Shih, M., 2019, Covolutional Neural Networks, from https://shihs.github.io/blog/machine%20learning/2019/02/25/Machine-Learning-Covolutional-Neural-Networks(CNN)
[18]. Krizhevsky, A., Sutskever, I., and Hinton, E., 2012, ImageNet Classification with Deep Convolutional Neural Network, Proceedings of the 25th International Conference on Neural Information Processing Systems (NOPS’12), Vol.1, pp.1097-1105
[19]. Perera, S., 2019, An introduction to Reinforcement Learning, from https://towardsdatascience.com/an-introduction-to-reinforcement-learning-1e7825c60bbe
[20]. Mnih, V., 2015, Human-level control through deep reinforcement learning, Proceedings of Nature, Vol.518, pp.529-533
[21]. Hasselt, H., Guez, A., and Silver, D., 2015, Deep Reinforcement Learning with Double Q-learning, from https://arxiv.org/abs/1509.06461
[22]. Wang, Z., 2016, Dueling Network Architectures for Deep Reinforcement Learning, from https://arxiv.org/abs/1511.06581
[23]. Sutton, R., 2000, Policy Gradient Methods for Reinforcement Learning with Function Approximation, Proceedings of the 12th International Conference on Neural Information Processing Systems, pp.1057-1063
[24]. Rosenstein, M., 2004, Supervised Actor-Critic Reinforcement Learning, from https://www-anw.cs.umass.edu/pubs/2004/rosenstein_b_ADP04.pdf
[25]. Silver, D., 2014, Deterministic Policy Gradient Algorithms, Proceedings of International Conference on Machine Learning, Vol.32.
[26]. Schulman, J., 2017, Proximal Policy Optimization Algorithms, from https://arxiv.org/abs/1707.06347
[27]. Jiang, Z., 2017, A Deep Reinforcement Learning Framework for the Financial Portfolio Management Problem, from https://arxiv.org/abs/1706.10059
[28]. TensorLayer, https://github.com/tensorlayer/tensorlayer/tree/master/examples/reinforcement_learning
[29]. Szegedy, C., 2015, Going deeper with convolutions, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-9
電子全文 電子全文(網際網路公開日期:20230701)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top