跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/09 03:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡永鴻
研究生(外文):Yung-HungTsai
論文名稱:深度強化學習法於序列預測之研究
論文名稱(外文):Sequence Forecasting using Deep Reinforcement Learning
指導教授:李昇暾李昇暾引用關係
指導教授(外文):Sheng-Tun Li
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:57
中文關鍵詞:強化學習主題機率模型早期預測時間序列文字探勘
外文關鍵詞:Reinforcement LearningTopic ModelEarly PredictionTime SeriesText Mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:136
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
基於早期預測的重要性、文字時間序列的分析價值與強化學習的泛用性,本研究將結合以上三個方向,探討如何建構應用強化學習法於文字形式時間序列之早期預測框架。文獻探討中分別討論遞歸神經網路中關於長短期記憶網路的發展與結構、文字探勘領域中主題模型的演變,以及強化學習中架構與算法的演進,最後討論這些領域中相關的研究與應用,並試圖在研究方法中加以結合,形成一個以隱含狄利克雷分布主題模型進行文字時間序列進行主題萃取,再以Dueling DQN框架進行早期預測模型的分析框架。而在進行實驗後,本研究所設計的模型可以再使用更少時間長度的前提下,做到與現有方法相同、甚至更好的預測成效,並且可以透過調整超參數以控制預測時間與準確度。而應用於真實的管理情境,我們認為可以透過調整超參數的方式以配合不同的情境。在後續研究中,則可以考慮不平衡或異常值檢測之文字時間序列早期預測模型,或是將此框架用於影片或聲音訊號之預測中。
Based on the importance of early prediction, the massive value of analysis the text-based time series, and the universal of reinforcement learning, this research investigates that how to construct an early prediction reinforcement learning framework of text-based time series. In the literature review, we discuss the application and network structure of Long short-term memory (LSTM) in a recurrent neural network (RNN), the evolution of topic model, the detail of reinforcement learning, and related research in these fields. In the research method, we combine the topic model and reinforcement learning to construct an early prediction framework that using Latent Dirichlet allocation (LDA) for topic extraction, and using the Dueling DQN framework for early prediction. After experiment, the proposed prediction model has identical or better prediction result then other methods on using partial time series. Also, we can adjust the hyper-parameters to control the prediction model's used time and accuracy. For Managerial Implication, Our research can apply to different situations by tuning the hyper-parameter. For future works, we can consider an unbalanced or anomaly detection reinforcement learning model for text-based time series; or reproduce the successful experience to the multimedia data and evolve to an different early prediction framework.
摘要i
Abstract ii
誌謝iii
Table of Contents vi
List of Tables viii
List of Figures ix
Chapter 1. Introduction 1
1.1 Background and Research Motivation . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Research Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Chapter 2. Literature Review 5
2.1 Long Short-Term Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Topic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1. Latent semantic analysis and PLSA . . . . . . . . . . . . . . . . . 8
2.2.2. Latent Dirichlet allocation . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.1. The framework of Reinforcement Learning . . . . . . . . . . . . . 14
2.3.2. Problem setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3.3. Q-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.4. Deep Q-learning network . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.5. Double DQN and prioritized experience replay . . . . . . . . . . . 19
2.3.6. Dueling DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4 Time Series Early Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Chapter 3. Research Method 23
3.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1.2. Early prediction of time series . . . . . . . . . . . . . . . . . . . . 24
3.2 Research Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1. Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2. Feature extracting . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.2.3. Reinforcement learning framework of text-based time series early
prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 4. Experiment and Analysis 33
4.1 Experiment Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Data Set Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Hyper-parameter Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.4 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4.1. Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4.2. Precision, recall and F-score . . . . . . . . . . . . . . . . . . . . . 40
4.4.3. Using time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Experiment Results and Analysis . . . . . . . . . . . . . . . . . . . . . . . 41
4.5.1. Comparing to other early prediction methods . . . . . . . . . . . . 41
4.5.2. Results with modifying hyper-parameter . . . . . . . . . . . . . . . 43
Chapter 5. Conclusion and Future work 45
5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2 Managerial Implication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Limitation and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 47
References 49
Appendix A. Pseudo Code for Research Famework 54
Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E. D., Gutierrez, J. B., & Kochut, K. (2017). A brief survey of text mining: Classification, clustering and extraction techniques. arXiv preprint arXiv:1707.02919.
Badea, I., & Trausan-Matu, S. (2013, Oct). Text analysis based on time series. In 2013 17th international conference on system theory, control and computing (icstcc) (p. 37-41). doi:10.1109/ICSTCC.2013.6688932
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2), 157–166.
Blei, D. M., & Lafferty, J. D. (2006). Dynamic topic models. In Proceedings of the 23rd international conference on machine learning (pp. 113–120).
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan), 993–1022.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., … Keogh, E. (2018, October). The ucr time series archive. arXiv preprint arXiv:1810.07758. Retrieved from https://arxiv.org/abs/1810.07758
Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., & Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science, 41(6), 391–407.
Doya, K. (1993). Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on neural networks, 1, 75–80.
Fawaz, H. I., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2018). Deep learning for time series classification: a review. arXiv preprint arXiv:1809.04356. Retrieved from https://arxiv.org/abs/1809.04356
Gao, X. (2018). Deep reinforcement learning for time series: playing idealized trading games. arXiv preprint arXiv:1803.03916.
Gers, F. A., Schmidhuber, J., & Cummins, F. (1999). Learning to forget: continual prediction with LSTM. In 9th international conference on artificial neural networks: ICANN '99. IEE. doi: 10.1049/cp:19991218
Goertzel, B., & Pennachin, C. (2007). Artificial general intelligence (Vol. 2). Springer.
Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). Lstm: A search space odyssey. IEEE transactions on neural networks and learning systems, 28(10), 2222–2232.
Grossberg, S., & Merrill, J. W. (1992). A neural network model of adaptively timed reinforcement learning and hippocampal dynamics. Cognitive Brain Research, 1(1), 3 - 38. Retrieved from http://www.sciencedirect.com/science/article/pii/092664109290003A doi: https://doi.org/10.1016/0926-6410(92)90003-A
Han, B., & Baldwin, T. (2011). Lexical normalisation of short text messages: Makn sens a# twitter. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1 (pp. 368–378).
Heinrich, G. (2009). Parameter estimation for text analysis (techreport). University of Leipzig, Germany.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation,9(8), 1735–1780.
Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In advances in neural information processing systems (pp. 856–864).
Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (pp. 289–296).
Hsu, C. K. (2018). A hidden topic model for prediction of crowdfunding campaigns (Master Thesis, National Cheng-Kung University, Tainan, Republic of China (R.O.C.)). Retrieved from https://hdl.handle.net/11296/2nfg7u
Huang, B.-Q., Cao, G.-Y., & Guo, M. (2005, Aug). Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. In 2005 international conference on machine learning and cybernetics (Vol. 1, p. 85-89). doi: 10.1109/ICMLC.2005.1526924
Jaeger, H. (2003). Adaptive nonlinear system identification with echo state networks. In Advances in neural information processing systems (pp. 609–616).
Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science, 304(5667), 78–80.
Jiang, Z., & Liang, J. (2017). Cryptocurrency portfolio management with deep reinforcement learning. In Intelligent systems conference (intellisys), 2017 (pp. 905–913).
Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
Jo, Y., Lee, L., & Palaskar, S. (2017). Combining lstm and latent topic modeling for mortality prediction. arXiv preprint arXiv:1709.02842.
Jo, Y., Loghmanpour, N., & Rosé, C. P. (2015). Time series analysis of nursing notes for mortality prediction via a state transition topic model. In Proceedings of the 24th acm international on conference on information and knowledge management (pp. 1171–1180).
Jo, Y., Maki, K., & Tomar, G. (2018). Time series analysis of clickstream logs from online courses. arXiv preprint arXiv:1809.04177.
Jozefowicz, R., Zaremba, W., & Sutskever, I. (2015). An empirical exploration of recurrent network architectures. In International conference on machine learning (pp. 2342–2350).
Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining insights from social media language: Methodologies and challenges. Psychological methods, 21(4), 507.
Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Kiss, T., & Strunk, J. (2006). Unsupervised multilingual sentence boundary detection. Computational Linguistics, 32(4), 485–525.
Lai, C. Y., Lo, P. C., & Hwang, S. Y. (2017). Incorporating comment text into success prediction of crowdfunding campaigns. In Pacis 2017 proceedings. Retrieved from https://aisel.aisnet.org/pacis2017/156Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274.
Lin, C.-T., & Jou, C.-P. (1999, July). Controlling chaos by GA-based reinforcement learning neural network. IEEE Transactions on Neural Networks, 10(4), 846-859. doi: 10.1109/72.774236
Lin, L. J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4), 293–321.
Maass, W., Natschläger, T., & Markram, H. (2002). Real-time computing without stable states: A new framework for neural computation based on perturbations. Neural computation,14(11), 2531–2560.
Marr, B. (2018, May). How much data do we create every day? the mind-blowing stats everyone should read. Retrieved from https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/#72e1011e60ba
Martinez, C., Perrin, G., Ramasso, E., & Rombaut, M. (2018). A deep reinforcement learning approach for early classification of time series. In Eusipco 2018. Retrieved from https://hal.archives-ouvertes.fr/hal-01825472/document
Metz, C. E. (1978, October). Basic principles of roc analsis. Seminars in Nuclear Medicine, 8(4), 283-298. Retrieved from http://www.umich.edu/~ners580/ners-bioe_481/lectures/pdfs/1978-10-semNucMed_Metz-basicROC.pdf
Mikolov, T. (2012). Statistical language models based on neural networks. Presentation at Google, Mountain View, 2nd April.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., … Hassabis, D. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Norvig, P. (2007). How to write a spelling corrector. De: http://norvig. com/spell-correct.html.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. In International conference on machine learning (pp. 1310–1318).
Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3), 130–137.
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
Schultz, J. (2017, October). How much data is created on the internet each day? Retrieved from https://blog.microfocus.com/how-much-data-is-created-on-the-internet-each-day/#
Silva, C., & Ribeiro, B. (2003). The importance of stop word removal on recall values in text categorization. In Neural networks, 2003. proceedings of the international joint conference on (Vol. 3, pp. 1661–1666).
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine learning, 3(1), 9–44.
Sutton, R. S., & Barto, A. G. (2017). Reinforcement learning: An introduction (2nd Edition, in preparation). Cambridge, MA: MIT Press.
Tsitsiklis, J. N., & Van Roy, B. (1997). Analysis of temporal-diffference learning with function approximation. In Advances in neural information processing systems (pp. 1075–1081).
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double q-learning. In Aaai (Vol. 2, p. 5).
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., & De Freitas, N.(2015). Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581.
Watkins, C. J., & Dayan, P. (1992). Q-learning. Machine learning, 8(3-4), 279–292.
Webster, J. J., & Kit, C. (1992). Tokenization as the initial phase in nlp. In Proceedings of the 14th conference on computational linguistics-volume 4 (pp. 1106–1110).
Wu, Q., Zhang, C., Hong, Q., & Chen, L. (2014). Topic evolution based on lda and hmm and its application in stem cell research. Journal of Information Science, 40(5), 611–620.
Xing, Z. z., Pei, J., & Yu, P. (2009, 01). Early prediction on time series: A nearest neighbor approach. In IJCAI 2009, proceedings of the 21st international joint conference on artificial intelligence, pasadena, california, usa, july 11-17, 2009 (p. 1297-1302).
Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent neural network regularization. arXiv preprint arXiv:1409.2329.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊