跳到主要內容

臺灣博碩士論文加值系統

(44.213.60.33) 您好!臺灣時間:2024/07/21 11:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:高博華
研究生(外文):KAO, PO-HUA
論文名稱:強化學習與總體經濟指標於股票市場交易之應用
論文名稱(外文):Reinforcement Learning with Macroeconomic Variables Applying to Stock Market Trading
指導教授:尤信程
指導教授(外文):YOU, SHIN-CHENG
口試委員:尤信程黃文吉吳牧恩
口試委員(外文):YOU, SHIN-CHENGHUANG, WEN-CHIWU, MU-EN
口試日期:2022-08-01
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:人工智慧與大數據高階管理雙聯碩士學位學程
學門:電算機學門
學類:電算機應用學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:中文
論文頁數:42
中文關鍵詞:深度學習神經網路強化學習政策導向演算法總體經濟指標
外文關鍵詞:Deep learningNeural networkReinforcement learningPolicy-based AlgorithmsMacro Economic Variables
相關次數:
  • 被引用被引用:1
  • 點閱點閱:264
  • 評分評分:
  • 下載下載:41
  • 收藏至我的研究室書目清單書目收藏:0
人工智慧試圖讓電腦像人類一樣的思考,使得機器可以根據所收集的資訊,模擬人類的決策,不斷自我調整與進化。近年來由於DeepMind AlphaGo 和OpenAI Five 等成功案例的出現,使得深度強化學習受到大家的重視,相關的技術發展也廣泛應用於金融詐騙偵測、零售採購預測、醫療、軍事、能源…等領域。
強化學習技術可應用在電腦遊戲上:首先,透過對遊戲環境的觀察取得資訊;其次,決定採取的步驟並執行;再來則是針對採取步驟後獲得此遊戲的回饋報酬是正向或負向,以及報酬程度的大小,調整決策後採取下一個步驟,並再次考量此步驟獲得的回饋報酬的方向與程度,再次調整決策後採取下一個步驟,如此不斷調整以追求在電腦遊戲中獲得最多的獎勵回饋。
強化學習亦可應用於金融市場股票交易:就像遊戲玩家在股市這個遊戲環境裡,透過對某些特定的股票執行[買、賣、不買不賣]三個動作,盡量獲得最多的報酬。
本研究使用Open AI的開源框架當作開發平台,使用PPO2演算法訓練交易代理人Agents進行股票交易操作。在股票市場環境的取樣上,採用了自2006年至2022年2月,美國股票市場的各產業類型代表性股票,每個交易日開盤、收盤、日中最高、最低價格、成交量資料,加上技術指標,作為取樣1。另加上美國主要市場指數、貨幣供給總額M1&M2、波動率指數(VIX)、10年期公債殖利率、美元指數等各市場指數與總體經濟指標資料,作為取樣2。並以最後兩年的資料作為測試驗證,其餘作為訓練使用。
我們讓Agent在「無總經指標」與「有總經指標」的資料集分別訓練後,分別進行三次的測試,再將三次的測試績效報酬率的算術平均數拿來比較,採用平均年化報酬率(即期末增加或減少的資產除以期初資產)的差異,探討增加的各市場指數與總體經濟指標,對交易代理人Agents交易股票之報酬率差異進行分析評估。
股票標的選擇了在美國股票市場掛牌,交易量較大或較具代表性的10家企業,包括資訊科技、金融服務、健康照護、週期性消費(零售商、汽車與零組件製造、餐廳、旅遊服務業者等)、防禦型消費(家居、飲食、包裝、煙草、個人產品等製造商、教育訓練服務業者等)、能源、工業、基礎材料(包括原物料探勘、開發、加工、精煉成為製成品)等八個產業:Apple Inc. (AAPL)、Micro Soft Corp. (MSFT)、J.P. Morgan (JPM)、Johnson & Johnson (JNJ)、United Health Group Inc. (UNH)、Home Depot Inc. (HD)、Walmart Inc. (WMT)、Exxon Mobil Corp. (XOM)、Union Pacific Corp. (UNP)、BHP Group Ltd. (BHP)。
實驗結果如下:
在D1的部分,加入各市場指數與總體經濟指標後,Agent對實驗所選的10檔股票操作績效年報酬率合計約為408.22%,相較於無加入各市場指數與總體經濟指標的情況下,Agent對實驗所選的十檔股票的操作績效合計約為434.57%,總報酬率約減少了26.35%,亦即減少了6.06%的原始報酬。在D2的部分,加入各市場指數與總體經濟指標後,Agent對實驗所選的10檔股票操作績效年報酬率合計約為124.76%,相較於無加入各市場指數與總體經濟指標的情況下,Agent對實驗所選的10檔股票的操作績效合計約為124.15%,總報酬率約增加了0.61%,亦即增加了0.49%的原始報酬。整體而言,依據實驗設計的環境、演算法與兩類資料集進行實驗的結果,我們觀察到實驗所增加的各市場指數與總體經濟指標等環境參數項目,對Agent加以訓練後沒有讓Agent的股票投資操作績效更好。
可能的原因有:股價與各項指標在變化的方向、持續性、幅度上的相關性不高。或是單一股票價格與成交量的變化,在各個時間區段對各市場指數與總體經濟指標的敏感度差異有可能並不十分一致。另外也可採用例如MlpLstmPolicy、CnnPolicy…等不同的Policy進行交易、或是交易過程中對於投資部位大小的控制、標的風險值與波動率的影響、交易策略的選擇與適用、投資過程中累積損益的變化與穩定性…等,未來都值得我們進一步研究探討其帶來的效益。
Artificial intelligence tries to make computers think like humans, so that machines can simulate human decisions based on the information they collect, and constantly adjust and evolve themselves. In recent years, due to the emergence of successful cases such as DeepMind AlphaGo and OpenAI Five, deep reinforcement learning has received attention from everyone, and related technological developments have also been widely used in financial fraud detection, retail procurement forecasting, medical, military, energy... and other fields.
Reinforcement learning techniques can be applied to computer games: first, information is obtained through observation of the game environment; Second, decide on the steps to be taken and implement them; Then, for the feedback reward obtained after taking the steps to get this game is positive or negative, and the size of the degree of reward, adjust the decision to take the next step, and again consider the direction and extent of the reward received by this step, and adjust the decision again to take the next step, so that continuous adjustment in pursuit of the most reward rewards in the computer game.
Reinforcement learning can also be applied to stock trading in the financial market: just like the game environment of the stock market, the game player tries to get the most reward by performing the three actions of [buying, selling, not buying and not selling] on certain stocks.
This study uses the open-source framework of Open AI as a development platform and uses the PPO2 algorithm to train trading agent to conduct stock trading operations. We selected 10 listed stocks from U.S stock markets and collected their trading prices and volumes from Jan. 2006 to Feb. 2022, including their daily opening price, closing price, day-high price, day-low price, and volumes, adding technical indicators as Dataset 1. We also collected some primary U.S. stock market indexes and macroeconomic indicators, such as S&P 500 Index, money supply M1 & M2, VIX, 10-year government bond yields, U.S Dollar Index etc., same period as Dataset 1, and combined them with the duplicated Dataset 1 as the Dataset 2. The last two years of the data in both Datasets were used for testing and validation, the others for training of the Agent.
We used Dataset 1 (without macroeconomic indicators) and Dataset 2 (with macroeconomic indicators) to train the agent, then tested it for three times, taking its arithmetic average annual returns (i.e., the assets that increased or decreased at the end of the period divided by the assets at the beginning) to compare and analyze the differences of the Agent’s performance between Dataset 1 and Dataset 2.
We selected 10 companies listed on the US stock market with a large trading volume or more representative in 8 different fields, including information technology, financial services, health care, cyclical consumption (retailers, automobile and component manufacturing, restaurants, tourism service providers, etc.), defensive consumption (manufacturers of household, catering, packaging, tobacco, personal products, etc., education and training services, etc.), energy, industry, basic materials (including raw material exploration, development, processing, refining into manufactured products) :
Apple Inc. (AAPL)、Micro Soft Corp. (MSFT)、J.P. Morgan (JPM)、Johnson & Johnson (JNJ)、United Health Group Inc. (UNH)、Home Depot Inc. (HD)、Walmart Inc. (WMT)、Exxon Mobil Corp. (XOM)、Union Pacific Corp. (UNP)、BHP Group Ltd. (BHP)。
The experimental results are as follows:
In the D1 part, after the addition of the macroeconomic indicators, the total annual return rate of the Agent's operation performance on the 10 stocks selected for the experiment is about 408.22%, compared with the total operating performance of the ten stocks selected by the experiment without the addition of the overall economic indicators, the total operating performance of the agent on the ten stocks selected by the experiment is about 434.57%, and the total return rate is about 26.35%, that is, decreased the original return for 6.06%. In the D2 part, after the addition of the overall economic indicators, the total annual return rate of the agent's operation performance on the 10 stocks selected for the experiment is about 124.76%, compared with the total operation performance of the 10 stocks selected by the experiment without adding the overall economic indicators, the total operating performance of the agent on the 10 stocks selected by the experiment is about 124.15%, and the total return rate is about 0.61%, that is, increased the original return for 0.49%. Overall, based on the experiment results with the designed environment, algorithm and two types of data sets, we observed that with the help of market indices and macroeconomic indicators, there seems no significant effect to the performance of the trading Agent.
There were some possible reasons, such as the correlation between stock prices and various indicators in the direction, sustainability and amplitude of change is not so high. Or changes in the price and volume of a single stock may not be very consistent or sensitive to market indices and macroeconomic indicators in the same period. In the field of deep reinforcement learning for stock trading, there are still many aspects for us to explore, such as trading with different Policies, MlpLstmPolicy, CnnPolicy, CnnLstmPolicy... etc, or the size control of investment positions in the trading process, the risk value and volatility of the target, the selection and application of trading strategies, the change and stability of accumulated profit and loss in the investment process...etc. In the future, it is worth further studying and exploring the benefits it brings for us.
摘 要 i
ABSTRACT iii
誌 謝 vi
目 錄 vii
表目錄 ix
圖目錄 x
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 1
1.3 論文組織架構 2
第二章 相關研究與背景介紹 3
2.1 深度學習 (Deep Learning) 3
2.1.1 人工神經網路(Artificial Neural Network) 3
2.2 強化學習 (Reinforcement learning) 5
2.2.1 強化學習概述 5
2.2.2 On-Policy V.S. Off-Policy 6
2.2.3 Proximal Policy Optimization(PPO)演算法 7
2.3 機器學習框架(Framwork) 9
2.3.1 Tensorflow 9
2.3.2 Stable Baselines 10
第三章 研究方法與實驗 11
3.1 研究方法概述 11
3.1.1 神經網路模組 11
3.1.2 神經網路策略 12
3.2 實驗 13
3.2.1 強化學習訓練環境 13
3.2.2 資料來源 13
3.2.3 實驗資料集 14
3.2.4 總體經濟指標介紹 15
3.2.4.1 美國標準普爾指數(S&P 500 Index) 15
3.2.4.2 美國那斯達克綜合指數(NASDAQ Index) 16
3.2.4.3 美國波動率指數(VIX) 17
3.2.4.4 美國10年期公債殖利率 18
3.2.4.5 美國貨幣供給M1、M2總額與成長率 19
3.2.4.6 美元指數(DXY) 21
3.2.5 實驗進行 23
3.3 實驗結果分析 25
3.3.1 Apple Inc. 25
3.3.2 Microsoft Corp. 26
3.3.3 J.P. Morgan 27
3.3.4 Johnson & Johnson 28
3.3.5 United Health Group Inc. 29
3.3.6 Home Depot Inc. 30
3.3.7 Walmart Inc. 31
3.3.8 Exxon Mobil Corp. 32
3.3.9 Union Pacific Corp. 33
3.3.10 BHP Group Ltd. 34
3.4 實驗結果 35
第四章 結論與探討 37
參考文獻 39
附錄 41

表目錄
表 1 投資個股明細表 14
表 2 整體年化投資報酬率 明細表 35
表 3 實驗數據-無總經指標年化報酬明細表 41
表 4 實驗數據-有總經指標年化報酬明細表 42

圖目錄
圖 1 單層感知器(Single Layer Perceptron) 結構圖 4
圖 2 多層感知器(Multilayer Perceptron,MLP)結構圖 4
圖 3 代理人(Agent)與環境(Environment)互動示意圖 6
圖 4 PPO演算法目標函數公式 7
圖 5 PPO控制項β調整規則(Adaptive KL Penalty) 7
圖 6 PPO2演算法目標函數公式 8
圖 7 PPO2 Clip函數示意圖 8
圖 8 多層感知器(MLP)神經網路結構圖 11
圖 9 MLP Policy神經網路示意圖 12
圖 10 美國標準普爾指數(S&P 500 Index) 月K線圖 15
圖 11 美國那斯達克綜合指數(NASDAQ Index) 月K線圖 16
圖 12 美國波動率指數(VIX)走勢圖 17
圖 13 美國10年期公債殖利率走勢圖 18
圖 14 美國貨幣供給總額M1走勢圖 19
圖 15 美國貨幣供給總額M2走勢圖 19
圖 16 美元指數(DXY)走勢圖 21
圖 17 美元指數主要成份貨幣圖 21
圖 18 Apple Inc. 月K線圖 25
圖 19 Apple Inc. 年化報酬率比較圖 25
圖 20 Microsoft Corp. 月K線圖 26
圖 21 Microsoft Corp. 年化報酬率比較圖 26
圖 22 J.P. Morgan 月K線圖 27
圖 23 J.P. Morgan 年化報酬率比較圖 27
圖 24 Johnson & Johnson月K線圖 28
圖 25 Johnson & Johnson 年化報酬率比較圖 28
圖 26 United Health Group Inc.月K線圖 29
圖 27 United Health Group Inc. 年化報酬率比較圖 29
圖 28 Home Depot Inc. 月K線圖 30
圖 29 Home Depot Inc. 年化報酬率比較圖 30
圖 30 Walmart Inc. 月K線圖 31
圖 31 Walmart Inc. 年化報酬率比較圖 31
圖 32 Exxon Mobil Corp. 月K線圖 32
圖 33 Exxon Mobil Corp. 年化報酬率比較圖 32
圖 34 Union Pacific Corp. 月K線圖 33
圖 35 Union Pacific Corp. 年化報酬率比較圖 33
圖 36 BHP Group Ltd. 月K線圖 34
圖 37 BHP Group Ltd.年化報酬率比較圖 34
圖 38 個股年化投資報酬率比較圖 D1 35
圖 39 個股年化投資報酬率比較圖 D2 36
[1] Stable-baselines Team. "Stable-baselines." from https://stable-baselines.readthedocs.io/.
[2] OpenAI. "OpenAI Baselines." from https://github.com/openai/baselines.
[3] John Schulman, et al. (2017) "Proximal Policy Optimization Algorithms."
[4] Hung-yi Lee, 2018 DRL Lecture 2: Proximal Policy Optimization (PPO), from https://www.youtube.com/watch?v=OAKAZhFmYoI
[5] 唐振嚴 (2020) “在策略為基礎的強化學習演算法中實現動作遮罩之方法與效果”, 國立臺北科技大學資訊工程研究所碩士論文。
[6] All the charts of stock price or the macroeconomic indicators were from: https://tw.tradingview.com/
[7] All the stock prices data were from: https://finance.yahoo.com/
[8] https://www.moneydj.com/
[9] https://www.macromicro.me/
[10] https://earning.tw/
[11] https://www.digitalengineering247.com/article/getting-started-with-reinforcement-learning/partner-content
[12] https://zh.wikipedia.org/zh-tw/%E4%BA%BA%E5%B7%A5%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C
[13] https://dyfeiyang.net/2019/03/03/feb-2019/
[14] https://kknews.cc/tech/jega9yy.html
[15] https://towardsdatascience.com/on-policy-v-s-off-policy-learning-75089916bc2f
[16] https://www.cnblogs.com/funaiclub/p/14888422.html
[17] https://www.gushiciku.cn/pl/aKxc/zh-tw
[18] https://makerpro.cc/2018/06/deep-learning-frameworks/
[19]https://medium.com/@qoo1234550/%E5%88%9D%E6%AC%A1%E8%B8%8F%E5%85%A5ai%E9%A0%98%E5%9F%9F-%E4%BD%A0%E6%89%80%E5%BF%85%E9%A0%88%E7%9F%A5%E9%81%93%E7%9A%84%E6%A6%82%E5%BF%B5%E6%A1%86%E6%9E%B6-framework-%E8%88%87%E5%B9%B3%E5%8F%B0-platform-%E5%8F%A6%E5%A4%96%E6%8F%90%E4%BE%9B%E7%B5%A6ai%E6%96%B0%E6%89%8B%E4%B8%80%E5%80%8B%E5%85%A5%E9%96%80%E5%B7%A5%E5%85%B7-7251fdc09443
[20] https://medium.com/@showay/%E5%88%A9%E7%94%A8-stable-baselines-%E8%BF%85%E9%80%9F%E5%BB%BA%E7%AB%8B%E5%BC%B7%E5%8C%96%E5%AD%B8%E7%BF%92-agent-ffdcd39bbaaa
[21] https://towardsdatascience.com/stable-baselines-a-fork-of-openai-baselines-reinforcement-learning-made-easy-df87c4b2fc82
[22] https://sandralin-21001.medium.com/%E8%AE%80%E4%BA%9B%E6%9D%B1%E8%A5%BF-%E5%81%9A%E9%BB%9E%E7%AD%86%E8%A8%98-ppo-trpo-c8afa7c176e9
[23] https://zh.wikipedia.org/zh-tw/%E6%84%9F%E7%9F%A5%E5%99%A8
[24] https://pyimagesearch.com/2016/09/26/a-simple-neural-network-with-python-and-keras/
[25] https://www.mdpi.com/2306-5354/5/2/35/htm

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊