跳到主要內容

臺灣博碩士論文加值系統

(44.212.96.86) 您好!臺灣時間:2023/12/07 01:00
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:葉駿諭
研究生(外文):Chun-Yu Yeh
論文名稱:基於深度強化學習與自我注意力機制之模仿學習於股票交易策略研究
論文名稱(外文):Stock Trading Strategies using Deep Reinforcement Learning and Self-Attention-based Imitation Learning
指導教授:陳煥陳煥引用關係
指導教授(外文):Huan Chen
口試委員:蔡崇煒陳喬恩
口試委員(外文):Chun-Wei TsaiChiao-En Chen
口試日期:2023-07-20
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:中文
論文頁數:88
中文關鍵詞:股票交易深度強化學習模仿學習逆向強化學習自注意力機制
外文關鍵詞:Stock TradingDeep Reinforcement LearningImitation LearningInverse Reinforcement LearningSelf Attention
相關次數:
  • 被引用被引用:0
  • 點閱點閱:22
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
股票資訊預測是金融投資當中一個重要的研究領域,準確預測股票價格可以帶來顯著的財務收益。進行股票交易預測的方法很多,無論是機器學習與深度學習或是現在深度強化學習,AI 無疑逐漸成為顯學,最近模仿學習 (Imitation Learning) 的方法更是頻繁被套用在金融市場的預測上。然而,由於股票市場的複雜度和動態性,目前的生成對抗性模仿學習 (Generative Adversarial Imitation Learning, GAIL) 和對抗性逆向強化學習 (Adversarial Inverse Reinforcement Learning, AIRL) 等模仿學習方法,在捕獲市場數據的過程中,複雜的時間關係和依賴性仍然存在局限性,因此這些模型往往難以準確預測股票價格。
本篇研究將自注意力機制 (Self Attention) 作為特徵提取器,添加到 GAIL 模型的鑑別器(Discriminator)中。自注意力機制強大的功能已被證明可有效捕獲時間序列數據中的遠程依賴性,本研究假設通過將自注意力機制作為特徵提取器添加到鑑別器中,該模型能夠更好地取得市場數據中複雜的時間關係,從而提高股票交易的回報率。本篇研究首先評估 GAIL 模型在歷史股價數據集上的性能,接著讓添加自注意力機制的模仿學習模型與原始模型進行比較。實驗結果表明,將自注意力機制作為特徵提取器添加到 GAIL 和 AIRL 等模型的鑑別器中可以提高股票交易模仿學習模型的性能。 與原始模型相比,修改後的模型大多可獲得更高的投資回報。
Stock prediction is an important research area in financial investment, as accurate forecasts can lead to significant financial gains. Methods for stock trading prediction have evolved from machine learning and deep learning to the emerging field of deep reinforcement learning. Recently, imitation learning techniques have also been applied in the financial market. However, due to the complexity and dynamics of the stock market, existing imitation learning methods such as Generative Adversarial Imitation Learning (GAIL) and Adversarial Inverse Reinforcement Learning (AIRL) still have limitations in capturing the complex temporal relationships and dependencies in market data, making it difficult for these models to accurately predict stock prices. In this study, we explore the effectiveness of incorporating the self-attention mechanism as a feature extractor into the discriminator of the GAIL model. We hypothesize that by adding self-attention as a feature extractor to the discriminator, the model will be able to better capture the complex temporal relationships and dependencies in market data, thereby improving the performance of stock trading. We also apply the self-attention mechanism to other classical imitation learning models such as AIRL and Behavior Cloning to confirm the generalizability of this approach to imitation learning models. The experimental results show that adding self-attention generally achieve higher investment returns.
摘要 i
Abstract ii
目錄 iii
表目錄 v
圖目錄 viii
第一章 緒論 1
1.1研究背景 1
1.2研究動機與研究目的 3
1.3主要貢獻 4
1.4 論文架構 4
第二章 背景知識與相關研究 5
2.1深度強化學習 5
2.1.1 DQN 6
2.1.2 Policy Gradient 8
2.1.3 股票預測 9
2.2 模仿學習 10
2.2.1行為克隆 12
2.2.2 反向強化學習 14
2.2.3 生成對抗模仿學習 16
2.2.4 對抗反向強化學習 19
2.3 自注意力機制 22
2.4 股票四大面向與技術指標 25
2.4.1 技術面 25
2.4.2 籌碼面 27
第三章 研究方法 29
3.1 模仿學習環境設計 29
3.1.1 狀態空間 29
3.1.2 動作空間 32
3.1.3 專家軌跡 32
3.2 模型架構 37
第四章 實驗結果與討論 39
4.1資料集與實驗環境介紹 39
4.1.1 股價資料 39
4.1.2 專家資料 45
4.1.3 訓練參數 46
4.2 實驗結果 48
4.2.1 專家軌跡比較 49
4.2.2 不同模仿學習模型比較 51
4.2.3 加入注意力 67
第五章 結論與未來展望 81
5.1結論 81
5.2 未來展望 82
參考文獻 83
[1]Hiransha M et al., "NSE stock market prediction using deep-learning models," Procardia Comput Sci, vol. 132, pp. 1351–1362, 2018.
[2]S.M. Kendall and K. Ord, "Time Series," New York: Oxford University Press, 1997.
[3]Fama EF, "Random walks in stock market prices," Financ Anal J, vol. 51, pp. 75–80, 1995.
[4]A. Ariyo, A. O. Adewumi, and C. K. Ayo, "Stock Price Prediction Using the ARIMA Model," 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation, pp. 106–112, 2014.
[5]Zhao, "A comparative study on GARCH effect and chaos in China's stock markets," 2009 International Conference on Test and Measurement, pp. 386-389, 2009.
[6]Kalyvas, E., "Using neural networks and genetic algorithms to predict stock market returns," MSc thesis, University of Manchester, 2001.
[7]Das, S. P., & Padhy, S., "Support vector machines for prediction of futures prices in Indian stock market," International Journal of Computer Applications, vol. 41, no. 3, pp. 22–26, 2012.
[8]Guo, Z., Wang, H., Liu, Q., & Yang, J., "A feature fusion based forecasting model for financial time series," PLoS ONE, vol. 9, no. 6, e101113, 2014.
[9]Lu, C. J., Lee, T. S., & Chiu, C. C., "Financial time series forecasting using independent component analysis and support vector regression," Decision Support Systems, vol. 47, no. 2, pp. 115–125, 2009.
[10]Aydin, A. D., & Cavdar, S. C., "Comparison of prediction performances of artificial neural network (ANN) and vector autoregressive (VAR) models by using the macroeconomic variables of gold prices, Borsa Istanbul (BIST) 100 index and US dollar–Turkish lira (USD/TRY) exchange rates," Procedia Economics and Finance, vol. 30, pp. 3–14, 2015.
[11]E. Giovanis, "Application of ARCH–GARCH models and feed-forward neural networks with Bayesian regularization in capital asset pricing model: the case of two stocks in Athens Exchange stock market," SSRN Electronic Journal, 2009.
[12]J. Yim, "A comparison of neural networks with time series models for forecasting returns on a stock market index." In T. Hendtlass, & M. Ali (Eds.), Developments in Applied Artificial Intelligence: 15th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems IEA/AIE 2002 Cairns, Australia, June 17–20, 2002 Proceedings, pp. 25–35. Berlin: Springer, 2002.
[13]W. Huang, Y. Nakamori, & S. Y. Wang, "Forecasting stock market movement direction with support vector machine." Computers and Operations Research, vol. 32, no. 10, pp. 2513–2522, 2005.
[14]F. E. Tay & L. Cao, "Application of support vector machines in financial time series forecasting." Omega, vol. 29, no. 4, pp. 309–317, 2001.
[15]Z. Wei, "A SVM approach in forecasting the moving direction of Chinese stock indices." MSc thesis, Lehigh University, Bethlehem, PA, 2012.
[16]Y. Bengio, A. Courville, & P. Vincent, "Representation learning: A review and new perspectives." IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798–1828, 2013.
[17]Y. LeCun, Y. Bengio, & G. Hinton, "Deep learning." Nature, vol. 521, no. 7553, pp. 436–444, 2015.
[18]J. Schmidhuber, "Deep learning in neural networks: An overview." Neural Networks, vol. 61, pp. 85–117, 2015.
[19]V. Kumari, V. Sharma, and S. Chauhan, "Prediction of Stock Price using Machine Learning Techniques: A Survey," in 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pp. 281–284, 2021.
[20]O. B. Sezer, M. U. Gudelek, and A. M. Ozbayoglu, "Financial time series forecasting with deep learning: A systematic literature review: 2005–2019." Applied Soft Computing, vol. 90, 2020.
[21]A. M. Ozbayoglu, M. U. Gudelek, O. B. Sezer, "Deep learning for financial applications: A survey," Applied Soft Computing, vol. 93, 2020.
[22]M. Nikou, G. Mansourfar, & J. Bagherzadeh, "Stock price prediction using DEEP learning algorithm and its comparison with machine learning algorithms." Intelligent Systems in Accounting Finance & Management, vol. 26, no. 4, pp. 164–174, 2019.
[23]E. F. Fama & M. E. Blume, "Filter rules and stock-market trading." Journal of Business, vol. 39, Supplement 1, pp. 226–241, 1966.
[24]X. Zhou, Z. Pan, G. Hu, S. Tang, & C. Zhao, "Stock market prediction on high-frequency data using generative adversarial nets." Mathematical Problems in Engineering, 2018.
[25]A. Alameer, H. Saleh, & K. Alshehri, "Reinforcement learning in quantitative trading: A survey," 2022.
[26]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, & I. Polosukhin, "Attention is all you need." arXiv [cs.CL], 2017.
[27]H. Zhang, I. Goodfellow, D. Metaxas, & A. Odena, "Self-attention generative adversarial networks." arXiv [stat.ML], 2018.
[28]R. Bellman, "Dynamic programming." Bulletin of the American Mathematical Society, vol. 60, no. 6, pp. 503–516, 1957.
[29]V. Mnih, K. Kavukcuoglu et al., "Human-level control through deep reinforcement learning." Nature, vol. 518, pp. 529–533, 2015.
[30]H. van Hasselt, A. Guez, & D. Silver, "Deep Reinforcement Learning with Double Q-learning." Association for the Advancement of Artificial Intelligence, 2016.
[31]Z. Wang, T. Schaul, M. Hessel, H. van Hasselt, M. Lanctot, & N. de Freitas, "Dueling Network Architectures for Deep Reinforcement Learning." International Conference on Machine Learning, vol. 48, 2016.
[32]R. S. Sutton, D. Mc Allester, S. Singh, & Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation." Neurips.Cc. Retrieved April 27, 2023.
[33]J. Schulman, S. Levine, P. Moritz, M. I. Jordan, & P. Abbeel, "Trust Region Policy Optimization." In F. Bach & D. Blei (Eds.), arXiv [cs.LG], pp. 1889–1897, Jul 2015.
[34]J. Schulman, F. Wolski, P. Dhariwal, A. Radford, & O. Klimov, "Proximal Policy Optimization Algorithms." In arXiv [cs.LG], 2017.
[35]N. Heess, D. Tb, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, & D. Silver, "Emergence of locomotion behaviours in rich environments." In arXiv [cs.AI], 2017.
[36]H. Yang, X.-Y. Liu, S. Zhong, & A. Walid, "Deep reinforcement learning for automated stock trading: An ensemble strategy." SSRN Electronic Journal, 2020.
[37]Z. Jiang, D. Xu, & J. Liang, "A deep Reinforcement Learning framework for the financial portfolio management problem." In arXiv [q-fin.CP], 2017.
[38]J. Sun, H. Li, H. Fujita, B. Fu, & W. Ai, "Class-imbalanced dynamic financial distress prediction based on adaboost-svm ensemble combined with smote and time weighting." Information Fusion, vol. 54, pp. 128–144, 2020.
[39]X. Wu, M. Zhong, Y. Guo, & H. Fujita, "The assessment of small bowel motility with attentive deformable neural network." Information Sciences, vol. 508, pp. 22–32, 2020.
[40]Z. Che, S. Purushotham, K. Cho, & D. Sontag, "Recurrent neural networks for multivariate time series with missing values." Scientific Reports, vol. 8, 6085, 2018.
[41]X. Wu, H. Chen, J. Wang, L. Troiano, V. Loia, & H. Fujita, "Adaptive stock trading strategies with deep reinforcement learning methods." Information Sciences, vol. 538, pp. 142–158, 2020.
[42]M. Wolter, & A. Yao, "Complex gated recurrent neural networks." In Advances in Neural Information Processing Systems, pp. 10536–10546, 2018.
[43]L. George, T. Buhet, E. Wirbel, G. Le-Gall, & X. Perrotton, "Imitation learning for end to end vehicle longitudinal control with forward camera." In arXiv [cs.CV], 2018.
[44]E. Johns, "Coarse-to-fine imitation learning: Robot manipulation from a single demonstration." In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 4613–4619, 2021.
[45]F. Codevilla, E. Santana, A. Lopez, & A. Gaidon, "Exploring the limitations of behavior cloning for autonomous driving." In 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9329–9338, 2019.
[46]M. Bansal, A. Krizhevsky, & A. Ogale, "Chauffeurnet: Learning to drive by imitating the best and synthesizing the worst." arXiv preprint arXiv:1812.03079, 2018.
[47]A. Y. Ng & S. Russell, "Algorithms for Inverse Reinforcement Learning." In 17th International Conference on Machine Learning, 2000.
[48]S. Y. Yang, Y. Yu, & S. Almahdi, "An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm." Expert Systems with Applications, vol. 114, pp. 388–401, 2018.
[49]T. Creswell, V. White, K. Dumoulin, A. Arulkumaran, B. Sengupta, & A. A. Bharath, "Generative Adversarial Networks: An Overview." IEEE Signal Processing Magazine, vol. 35, no. 1, pp. 53–65, 2018.
[50]J. Ho & S. Ermon, "Generative Adversarial Imitation Learning." In 30th Conference on Neural Information Processing Systems, 2016.
[51]A. Kuefler, J. Morton, T. Wheeler, & M. Kochenderfer, "Imitating driver behavior with generative adversarial networks." In IEEE Intelligent Vehicles Symposium (IV), pp. 204-211, 2017.
[52]B. Eysenbach, A. Gupta, J. Ibarz, & S. Levine, "Diversity is All You Need: Learning Skills without a Reward Function." In International Conference on Learning Representations, 2019.
[53]J. Song, H. Ren, D. Sadigh, & S. Ermon, "Multi-Agent Generative Adversarial Imitation Learning." In Advances in Neural Information Processing Systems, edited by S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, vol. 31, Curran Associates, Inc., 2018.
[54]J. Fu, K. Luo, & S. Levine, "Learning robust rewards with adversarial inverse reinforcement learning." In arXiv [cs.LG], 2017.
[55]A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, & I. Polosukhin, "Attention is all you need." In arXiv [cs.CL], 2017.
[56]P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Liò, & Y. Bengio, "Graph Attention Networks." In arXiv [stat.ML], 2017.
[57]C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon, C. Hawthorne, A. M. Dai, M. D. Hoffman, M. Dinculescu, & D. Eck, "Music Transformer." In arXiv [cs.LG], 2018.
[58]J. Liu, H. Lin, X. Liu, B. Xu, Y. Ren, Y. Diao, & L. Yang, "Transformer-based capsule network for stock movement prediction." 2019.
[59]J. Zheng, A. Xia, L. Shao, T. Wan, & Z. Qin, "Stock volatility prediction based on self-attention networks with social information." In 2019 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 1–7, 2019.
[60]S. Deng, T. Mitsubuchi, K. Shioda, T. Shimada, & A. Sakurai, "Combining technical analysis with sentiment analysis for stock price prediction." In 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing, pp. 800–807, 2011.
[61]T. H. Nguyen, K. Shirai, & J. Velcin, "Sentiment analysis on social media for stock movement prediction." Expert Systems with Applications, vol. 42, no. 24, pp. 9603–9611, 2015.
[62]J. Carapuço, R. Neves, & N. Horta, "Reinforcement learning applied to Forex trading." Applied Soft Computing, vol. 73, pp. 783–794, 2018.
[63]J. W. Lee, "Stock price prediction using reinforcement learning." In 2001 IEEE International Symposium on Industrial Electronics Proceedings, vol. 1, pp. 690–695, 2001.
[64]“Yahoo! Finance”. [Online]. Available: https://finance.yahoo.com/
[65]“FinMind”. [Online]. Available: https://github.com/FinMind/FinMind/
電子全文 電子全文(網際網路公開日期:20250720)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊