跳到主要內容

臺灣博碩士論文加值系統

(18.97.9.175) 您好!臺灣時間:2024/12/10 16:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:徐芷翎
研究生(外文):Hsu, Chih-Ling
論文名稱:可預測顯著金融趨勢之異質性資訊融合方法
論文名稱(外文):Heterogeneous Information Fusion Method for Significant Financial Trend Prediction
指導教授:曾新穆曾新穆引用關係
指導教授(外文):Tseng, Shin-Mu
口試委員:孫宏民黃俊龍彭文志曾新穆
口試委員(外文):Sun, Hung-MinHuang, Jiun-LongPeng, Wen-ChihTseng, Shin-Mu
口試日期:2020-06-18
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2020
畢業學年度:108
語文別:中文
論文頁數:60
中文關鍵詞:異質性資料整合輿論探勘機器學習自然語言處理
外文關鍵詞:heterogeneous information fusionopinion miningmachine learningnatural language processing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:228
  • 評分評分:
  • 下載下載:5
  • 收藏至我的研究室書目清單書目收藏:0
隨著巨量資料在資料量、資料多元性、資料即時性以及資料難辨性上的迅速成⻑, 在處理實際問題時,異質性資料整合的使用正逐漸變得普遍,而整合來自不同來源甚 至不同型態的資料已經成為一個重要的趨勢。其中,由於社群媒體逐漸成為大眾進行 投資的過程中一項重要的資訊來源,在使用時間序列以外同時運用到線上的文字資訊 來對金融時間序列的趨勢進行預測已經成為近年來的研究熱點之一。然而,由於金融 市場的混亂本質、線上社群討論內容的易變化性、以及金融趨勢類型分布的不平均, 預測金融趨勢一直以來都是一項極為困難的任務。面對這些挑戰,我們提出了一個異 質性時間序列資訊整合方法,通過整合多變量時間序列與多維度公眾輿論,來對金融 時間序列未來的重大變化進行預測。根據真實資料,我們進行了一系列的實驗並與其 他現存的方法進行比較,實驗結果顯示我們所提出的方法提升了對重大變化類別的靈 敏度 (sensitivity) 並在馬修斯相關係數 (Matthews correlation coefficient) 上提升了 49% 的表現。
Due to the rapid growth of big data over volume, velocity, veracity and variety, hetero- geneous data fusion is getting ubiquitous in solving practical problems. In particular, as social media becomes a key source of information in investment processes, forecasting trends of financial time series by exploiting not only time series data but also social media content has become a research hot-spot nowadays. However, predicting the future trend of a financial time series has long been an extremely challenging task due to the chaotic nature of financial markets, the drastically varying discussion on social media platforms, and the imbalance distribution for different financial trends. To address these issues, we propose a heterogeneous data fusion framework to predict significant financial trends by integrating multivariate time series with multi-dimensional public opinion. Based on real-world data, experimental results demonstrate that our proposed framework delivers 49% improvement in terms of Matthews Correlation Coefficient and enhances the sensitivity of the classes indicating significant changes by comparison with other existing methods. To our best knowledge, this is the first work that predicts financial trend with the consideration of significance thresholds and the multi-day trend window.
1 Introduction 1
1.1 Background and Motivation 1
1.2 Problem Definition 3
1.3 Challenges 5
1.4 Research Aims 6
1.5 Contribution 6
1.6 Thesis Organization 6
2 Related Works 7
2.1 Impact of Social Opinion on Financial Markets 7
2.2 Time Series Prediction in Financial Markets 8
2.3 Detecting Significant Changes in Time Series 10
3 Proposed Method 11
3.1 Overview of Proposed Method 11
3.2 Numerical Feature Extraction 12
3.2.1 Technical Analysis 12
3.2.2 Multivariate Time Series Analysis 15
3.3 Sentiment-based Opinion Mining 17
3.3.1 Posts Filtration and Preprocessing 17
3.3.2 Personal Sentiment Extraction 18
3.3.3 Financial-oriented Sentiment Extraction 19
3.4 Opinion Aggregation 20
3.4.1 Topic Modeling 21
3.4.2 User Score Calculation 23
3.5 Opinion Fusion 24
3.6 Heterogeneous Time Series Fusion 26
3.7 Significant Trend Prediction 30
4 Experiment Evaluation 33
4.1 Datasets 33 4.1.1 Social Media Content Dataset 34
4.1.2 Financial Time Series & Related Time Series Dataset 35
4.2 Experiment Settings 36
4.2.1 Evaluation Metrics 37
4.2.2 Methods in Comparison 38
4.3 Experiment Results: External Evaluation 39
4.4 Experiment Results: Internal Evaluation 44
4.4.1 Different Length of Trend Windows 44
4.4.2 Different Scales of Significance Thresholds 45
4.4.3 Different Choice of Detection Thresholds 46
4.4.4 Heterogeneous Fusion for Noise Alleviation 47
4.4.5 Model Insights 48
4.4.6 Complementary Nature of Heterogeneous Information 49
5 Conclusion and Future Works 52
5.1 Conclusion 52
5.2 Future Works 53
References 55
[1] CL Philip Chen and Chun-Yang Zhang. “Data-intensive applications, challenges, techniques and technologies: A survey on Big Data”. In: Information sciences 275 (2014), pp. 314–347.
[2] Xindong Wu et al. “Data mining with big data”. In: IEEE transactions on knowledge and data engineering 26.1 (2013), pp. 97–107.
[3] Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. “Multimodal machine learning: A survey and taxonomy”. In: IEEE Transactions on Pattern Analysis and Machine Intelligence 41.2 (2018), pp. 423–443.
[4] Daniel J. Connell. Institutional Investing: How Social Media Informs and Shapes the Investing Process. url: https://www.greenwich.com/asset-management/ institutional-investing-how-social-media-informs-and-shapes-investing- process.
[5] Johan Bollen, Huina Mao, and Xiaojun Zeng. “Twitter mood predicts the stock market”. In: Journal of computational science 2.1 (2011), pp. 1–8.
[6] Robert P Schumaker and Hsinchun Chen. “A quantitative stock prediction system based on financial news”. In: Information Processing & Management 45.5 (2009), pp. 571–583.
[7] Paul C Tetlock. “Giving content to investor sentiment: The role of media in the stock market”. In: The Journal of finance 62.3 (2007), pp. 1139–1168.
[8] Paul C Tetlock, Maytal Saar-Tsechansky, and Sofus Macskassy. “More than words: Quantifying language to measure firms’ fundamentals”. In: The Journal of Finance 63.3 (2008), pp. 1437–1467.
55
[9] Gary M Weiss and Foster Provost. “Learning when training data are costly: The effect of class distribution on tree induction”. In: Journal of artificial intelligence research 19 (2003), pp. 315–354.
[10] Nitesh V Chawla. “C4. 5 and imbalanced data sets: investigating the effect of sam- pling method, probabilistic estimate, and decision tree structure”. In: Proceedings of the ICML. Vol. 3. 2003, p. 66.
[11] Nitesh V Chawla. “Data mining for imbalanced datasets: An overview”. In: Data mining and knowledge discovery handbook. Springer, 2009, pp. 875–886.
[12] Brendan O’Connor et al. “From tweets to polls: Linking text sentiment to public opinion time series”. In: Fourth International AAAI Conference on Weblogs and Social Media. 2010.
[13] Qing Li et al. “The effect of news and public mood on stock movements”. In: In- formation Sciences 278 (2014), pp. 826–840.
[14] Jasmina Smailović et al. “Stream-based active learning for sentiment analysis in the financial domain”. In: Information sciences 285 (2014), pp. 181–203.
[15] Masoud Makrehchi, Sameena Shah, and Wenhui Liao. “Stock prediction using event- based sentiment analysis”. In: Proceedings of the 2013 IEEE/WIC/ACM Interna- tional Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technolo- gies (IAT)-Volume 01. IEEE Computer Society. 2013, pp. 337–342.
[16] Thien Hai Nguyen and Kiyoaki Shirai. “Topic modeling based sentiment analysis on social media for stock market prediction”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Vol. 1. 2015, pp. 1354–1364.
[17] Jianfeng Si et al. “Exploiting topic based twitter sentiment for stock prediction”. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vol. 2. 2013, pp. 24–29.
[18] Jianfeng Si et al. “Exploiting social relations and sentiment for stock prediction”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, pp. 1139–1145.
56

[19] Rajashree Dash and Pradipta Kishore Dash. “A hybrid stock trading framework integrating technical analysis with machine learning techniques”. In: The Journal of Finance and Data Science 2.1 (2016), pp. 42–57.
[20] M Ugur Gudelek, S Arda Boluk, and A Murat Ozbayoglu. “A deep learning based stock trading model with 2-D CNN trend detection”. In: 2017 IEEE Symposium Series on Computational Intelligence (SSCI). IEEE. 2017, pp. 1–8.
[21] David MQ Nelson, Adriano CM Pereira, and Renato A de Oliveira. “Stock market’s price movement prediction with LSTM neural networks”. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE. 2017, pp. 1419–1426.
[22] Zhengyang Dong. “Dynamic Advisor-Based Ensemble (dynABE): Case study in stock trend prediction of critical metal companies”. In: PloS one 14.2 (2019), e0212487.
[23] Yao Qin et al. “A dual-stage attention-based recurrent neural network for time series prediction”. In: arXiv preprint arXiv:1704.02971 (2017).
[24] Guokun Lai et al. “Modeling long-and short-term temporal patterns with deep neural networks”. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. ACM. 2018, pp. 95–104.
[25] Akira Yoshihara et al. “Predicting stock market trends by recurrent deep neural net- works”. In: Pacific rim international conference on artificial intelligence. Springer. 2014, pp. 759–769.
[26] Robert P Schumaker and Hsinchun Chen. “Textual analysis of stock market predic- tion using breaking financial news: The AZFin text system”. In: ACM Transactions on Information Systems (TOIS) 27.2 (2009), p. 12.
[27] Yuzheng Zhai, Arthur Hsu, and Saman K Halgamuge. “Combining news and tech- nical indicators in daily stock price trends prediction”. In: International symposium on neural networks. Springer. 2007, pp. 1087–1096.
[28] Xi Zhang et al. “Improving stock market prediction via heterogeneous information fusion”. In: Knowledge-Based Systems 143 (2018), pp. 236–247.
[29] Qing Li et al. “Tensor-based learning for predicting stock movements”. In: Twenty- Ninth AAAI Conference on Artificial Intelligence. 2015.
57

[30] Anuj Mahajan, Lipika Dey, and Sk Mirajul Haque. “Mining financial news for major events and their impacts on the market”. In: 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology. Vol. 1. IEEE. 2008, pp. 423–426.
[31] Ishan Verma, Lipika Dey, and Hardik Meisheri. “Detecting, quantifying and access- ing impact of news events on Indian stock indices”. In: Proceedings of the Interna- tional Conference on Web Intelligence. ACM. 2017, pp. 550–557.
[32] Ziniu Hu et al. “Listening to chaotic whispers: A deep learning framework for news- oriented stock trend prediction”. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining. ACM. 2018, pp. 261–269.
[33] Yumo Xu and Shay B Cohen. “Stock movement prediction from tweets and his- torical prices”. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018, pp. 1970–1979.
[34] Xiao Ding et al. “Knowledge-driven event embedding for stock prediction”. In: Pro- ceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 2016, pp. 2133–2142.
[35] Zhang Y. Liu T. & Duan J. Ding X. “Deep learning for event-driven stock predic- tion”. In: 2015.
[36] Xiao Ding et al. “Using structured events to predict stock price movement: An em- pirical investigation”. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2014, pp. 1415–1425.
[37] Patrick Perkins and Steffen Heber. “Identification of Ribosome Pause Sites Using a Z-Score Based Peak Detection Algorithm”. In: 2018 IEEE 8th International Con- ference on Computational Advances in Bio and Medical Sciences (ICCABS). IEEE. 2018, pp. 1–6.
[38] Jimmy Moore et al. “Managing In-home Environments through Sensing, Annotat- ing, and Visualizing Air Quality Data”. In: Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2.3 (2018), p. 128.
[39] Nitesh V Chawla et al. “SMOTE: synthetic minority over-sampling technique”. In: Journal of artificial intelligence research 16 (2002), pp. 321–357.
58

[40] Xu-Ying Liu, Jianxin Wu, and Zhi-Hua Zhou. “Exploratory undersampling for class- imbalance learning”. In: IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39.2 (2008), pp. 539–550.
[41] Shuo Wang and Xin Yao. “Diversity analysis on imbalanced data sets by using ensemble models”. In: 2009 IEEE Symposium on Computational Intelligence and Data Mining. IEEE. 2009, pp. 324–331.
[42] Bartosz Krawczyk, Michał Woźniak, and Gerald Schaefer. “Cost-sensitive decision tree ensembles for effective imbalanced classification”. In: Applied Soft Computing 14 (2014), pp. 554–562.
[43] Yanmin Sun et al. “Cost-sensitive boosting for classification of imbalanced data”. In: Pattern Recognition 40.12 (2007), pp. 3358–3378.
[44] Zhi-Hua Zhou and Xu-Ying Liu. “On multi-class cost-sensitive learning”. In: Com- putational Intelligence 26.3 (2010), pp. 232–257.
[45] Liang Zhang et al. “CADEN: A Context-Aware Deep Embedding Network for Fi- nancial Opinions Mining”. In: 2018 IEEE International Conference on Data Mining (ICDM). IEEE. 2018, pp. 757–766.
[46] Saif M Mohammad and Peter D Turney. “Emotions evoked by common words and phrases: Using mechanical turk to create an emotion lexicon”. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and gener- ation of emotion in text. Association for Computational Linguistics. 2010, pp. 26– 34.
[47] Jacopo Staiano and Marco Guerini. “Depeche Mood: a Lexicon for Emotion Analysis from Crowd Annotated News”. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). Vol. 2. 2014, pp. 427–433.
[48] Clayton J Hutto and Eric Gilbert. “Vader: A parsimonious rule-based model for sentiment analysis of social media text”. In: Eighth international AAAI conference on weblogs and social media. 2014.
[49] Jacob Devlin et al. “Bert: Pre-training of deep bidirectional transformers for lan- guage understanding”. In: arXiv preprint arXiv:1810.04805 (2018).
59

[50] David M Blei, Andrew Y Ng, and Michael I Jordan. “Latent dirichlet allocation”. In: Journal of machine Learning research 3.Jan (2003), pp. 993–1022.
[51] Dat Quoc Nguyen et al. “Improving topic models with latent feature word repre- sentations”. In: Transactions of the Association for Computational Linguistics 3 (2015), pp. 299–313.
[52] Tomas Mikolov et al. “Distributed representations of words and phrases and their compositionality”. In: Advances in neural information processing systems. 2013, pp. 3111–3119.
[53] Ledyard R Tucker. “Some mathematical notes on three-mode factor analysis”. In: Psychometrika 31.3 (1966), pp. 279–311.
[54] Harold Hotelling. “Analysis of a complex of statistical variables into principal com- ponents.” In: Journal of educational psychology 24.6 (1933), p. 417.
[55] Arie Kapteyn, Heinz Neudecker, and Tom Wansbeek. “An approach ton-mode com- ponents analysis”. In: Psychometrika 51.2 (1986), pp. 269–275.
[56] Lieven De Lathauwer, Bart De Moor, and Joos Vandewalle. “A multilinear singular value decomposition”. In: SIAM journal on Matrix Analysis and Applications 21.4 (2000), pp. 1253–1278.
[57] Harold Cramer. “Mathematical methods of statistics, Princeton Univ”. In: Press, Princeton, NJ (1946).
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊