跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.60) 您好!臺灣時間:2026/06/24 13:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:歐仁彬
研究生(外文):Jen-Bing Ou
論文名稱:運用模糊支持向量機於情緒分析:以股價預測與直播分析為例
論文名稱(外文):Applications of fuzzy support vector machines for sentiment analysis: stock prediction and live streaming analysis
指導教授:郝沛毅郝沛毅引用關係蘇明鴻蘇明鴻引用關係
指導教授(外文):Pei-Yi HaoMing-Hung Shu
口試委員:曾淑美楊東震黃天受郝沛毅蘇明鴻
口試委員(外文):Shu-Mei TsengDong-Jenn YangTien-Shou HuangPei-Yi HaoMing-Hung Shu
口試日期:2019-01-09
學位類別:博士
校院名稱:國立高雄科技大學
系所名稱:電子工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:106
中文關鍵詞:模糊理論支持向量機股價預測線上直播情緒分析
外文關鍵詞:fuzzy set theorysupport vector machinestock price forecastlive streamingsentimental analysis
相關次數:
  • 被引用被引用:4
  • 點閱點閱:796
  • 評分評分:
  • 下載下載:12
  • 收藏至我的研究室書目清單書目收藏:2
支持向量機(Support Vector Machine; SVM) 是一種優異的機器學習技術,它以Vapnik的統計學習理論為基礎。當建構的模式充滿了不確定及複雜的現象,模糊模式是必須被考慮的,若以Zadeh所提出的模糊系統來表示,模糊決策在概念上是更接近於人類的決策方法。基本上,模糊理論提供了有效的方法擷取『近似於』、『不精確』等現實世界中的獨特特性。在論文中,使用結合模糊理論與支持向量機的概念,嘗試去同時保留支持向量機的優點—即以統計學習理論保證的優良推理能力,與模糊理論的優點—即概念上更接近人類思考,以及能更有效的表達複雜與不確定的系統。並且提出了下面二種應用:
應用1: 能夠成功預測股票漲跌趨勢明顯地有許多好處,根據效率市場假設,公司股票的價值是由當前所有可用的信息給定。在大數據時代,線上新聞文章的數量持續增長,在如此巨量的文字資料面前,越來越多的機構依靠現代計算機的高速處理能力來進行文字探勘與機器學習,以建構更準確的股價趨勢預測模型。本論文中,從新聞文章中萃取出隱含的主題模型與情緒資訊,此外,並開發一個模糊支持向量機來融合線上新聞文章內含的豐富資訊,以預測股價的漲跌趨勢。本研究在食品類股的預測正確率最高為87%,半導體類股的正確率最高為71%,電腦周邊類股的預測正確率最高為69%,相較於傳統支持向量機透過關鍵字來預測股價漲跌趨勢的正確率僅五成多(接近於隨機猜測),本研究所提出的方法明顯優於傳統的支持向量機預測模型。
應用2: 「網路直播平台 (Live streaming video)」是近年不可被忽視的新興社群媒體平台。相較於一般影音平台乃是透過事先拍攝後進行上傳至網路分享,直播多了「即時性」與「互動性」兩種特性,在直播過程中,實況主能依據觀眾在聊天室的反應,立即做出回應,創造出更多的差異化內容,但觀眾踴躍的發言時,實況主就有可能遺漏觀眾的訊息。因此本研究之目的是希望透過情緒探勘技術,探勘聊天室的內容後,以較為簡單方式呈現觀眾想表達的意見,希望有助於實況主能以較輕鬆的方式得知觀眾反應,並可做為內容調整之參酌。本研究提出之情緒探勘系統主要分為三大步驟,首先透過「網路聊天室爬蟲」建立Socket連線至Twitch-IRC(Server),即時自動擷取聊天室內容,並將取得的留言內容進行「網路用語正規化」的步驟後,再透過模糊支持向量機進行情緒「正面」以及「負面」分類,透過模糊理論可以更精確的計算正面(負面)情緒的歸屬程度。最後以「文字雲」、「情緒波動圖」、「情緒雷達圖」、「情緒直方圖」、「情緒盒鬚圖」各式圖表進行結果呈現,以利決策者快速判讀與做出最佳決策。
Fuzziness is considered in decision-making systems as human estimation is involved. This thesis applies the concept of fuzzy set theory into the decision model of the support vector machine (SVM) for tackling two real-life applications, which integrates the advantages of support vector machine (i.e. well generalization ability) and fuzzy set theory (i.e. closer to human thinking).
Application I: Obviously, many incentives for us to forecast the price trends of stock. According to the hypothesis of efficiency market, the prices of stock are evaluated by all the current existing information. In this Big-data era, with the explosive increase of on-line news and the gigantic text information, more and more institutes depend on modern powerful computer to process text mining and machine learning for constructing the more precise price trend forecast model. In this study, we first extract the implied topic model and emotional information from news articles. Then, a fuzzy SVM is introduced to merge the abundant information from the on-line news, which can be utilized to forecast the trend of stock prices. The results conclude that the highest forecast accuracy rate was 87% for the food-related stocks, 71% for the semiconductors-related stocks, and 69% for the computer accessories-related stocks. As the forecast accuracy rates from the traditional SVM of stock are barely above 50%, our proposed method has shown significantly better than the forecasting model of traditional SVM.
Application II: "Live streaming video" has been an emerging social media platform that cannot be ignored in recent years. Compared with the traditional video platforms which are uploaded to Internet for sharing through pre-photographing, the live streaming has another two characteristics of “immediateness” and “interactivity”. While the live streaming is processing, the live actor can reply to the audience’s response in the chat room immediately, creating more differentiated content. However, when audience talks actively, the live actor may miss some messages from the audience. The purpose of this study is to explore the contents in the chat room through emotion mining technique, and then to interpret what the audience wants to express in a relatively simple means. Hopefully, this may help the live actor understand the audience reaction in a more relaxing way, and take it as a reference for content adjustment. The emotion mining system proposed in this study can be divided into three main steps. Firstly, we establish a Socket connection to the Twitch-IRC (Server) through "Internet chat room crawler" to retrieve the content in the chat room instantly and automatically, and then to process “internet terms normalization" to the retrieved content. Secondly, the classification of the "positive" and "negative" categories of emotions through fuzzy support vector machines is performed. The membership degree of positive (negative) emotions based on the fuzzy set theory has demonstrated more accurately. Finally, the results are presented by various visual effects including "word cloud", "emotional fluctuation chart", "emotional radar chart", "emotional histogram" and "emotional box plot."
Table of Contents

Abstract i
Acknowledgments . vi
Table of Contents . vii
List of Tables . ix
List of Figures . xi
Chapter 1 Introduction . 1
1.1 Motivation . 1
1.1.1 Forecasting Stock Price Trends Through News Articles . 1
1.1.2 Emotion Mining from Internet Live Streaming Chat room . 3
1.2 Organization of Dissertation 6
Chapter 2 Support Vector Machine Frameworks . 7
2.1 Support Vector Machine for Classification . 7
2.1.1 An Optimal Margin Classifier 7
2.1.2 Training Support Vector Machine 9
2.1.3 Nonlinear Decision Surfaces . 12
Chapter 3 Forecasting Stock Price Trends Through News Articles-Integrating Sentiment Analysis, Topic Model, and Support Vector Machine 15
3.1 Correlation Studies 15
3.1.1 Efficient-Market Hypothesis and Random Walk Hypothesis 16
3.1.2 Fundamental Analysis 17
3.1.3 Forecasting the Stock Price Trend from Finical News 18
3.1.4 Sentiment Analysis and Stock Pprice Trend Forecasting 20
3.2 Research Methodology 21
3.2.1 Date Collection 22
3.2.2 Features Selection 23
3.2.2.1 The Characteristics of News Articles Emotional Dimension Intensity Distribution 24
3.2.2.2 Distribution Features of Topic Probability for News Articles 27
3.2.2.3 Particle Swarm Optimization Algorithm for Features Selection . 30
3.2.3 Using Fuzzy Support Vector Machine with Fuzzy Hyperplane as Core Classifier 33
3.3 Experiment Results 41
3.4 Conclusion 47
Chapter 4 Emotion Mining from Internet Live Streaming Chat room-Using Fuzzy Support Vector Machine . 49
4.1 Literature Discussion 50
4.1.1 Sentiment Analysis and Its Application 50
4.1.2 Internet Terms 54
4.1.3 Internet Terms Normalization . 56
4.2 Research Methodology 57
4.2.1 .The Normalization of Internet Terms 59
4.2.1.1 Special Terms Detecting 61
4.2.1.2 Deduplication 61
4.2.1.3 Emoticon Normalization 62
4.2.1.4 Special Terms Normalization 63
4.2.1.5 Normalization of Phonetic Symbol 63
4.2.2 Emotional Classification 67
4.2.2.1 Obtaining Emotional Character 67
4.2.2.2 Getting Semantic Characteristics (Embedding Cluster Characteristics) 69
4.2.2.3 Fuzzy Support Vector Machine 71
4.3 Experimental Results 76
4.4 Conclusion 84
Chapter 5 Conclusions and Future Directions . 85
References . 88

List of Tables

Table 2.1 Some commonly used kernel functions and their corresponding classifiers . 14
Table 3.1 categories and examples from cliwc dictionary 24
Table 3.2 Distribution of emotional words in news articles 25
Table 3.3 Finding the hidden topic by LDA 28
Table 3.4 Topic distribution of news article 29
Table 3.5 The forecast accuracies of keywords, all 30 topics, and the topic features selected by particle swarm algorithm by using traditional support vector machine 43
Tab 3.6 Comparison table of forecast accuracy rates of fuzzy SVM and traditional SVM with different kinds of features on different stocks and at two time points 44
Table 4.1 categories of web terms 55
Table 4.2 emoticon comparison table-example 62
Table 4.3 special terms comparison table(example) 63
Table 4.4 example of phonetic symbol 64
Table 4.5 Example of string characteristic values selecting 66
Table 4.6 Example of string characteristic values . 66
Table 4.7 example of string characteristic values(cont) 67
Table 4.8 Emotional vocabulary comparison table 68
Table 4.9 Rules of emotion transform 68
Table 4.10 Example of emotion character selecting 69
Table 4.11 Example of Embedding cluster 70
Table 4.12 Phonetic symbol classification category (only partial displayed) 77
Table 4.13 Phonetic symbol SVM classifier- experimental results(real time data) 79
Table 4.14 emotional SVM classifier- experimental results(real time data) 80

List of Figures

Figure 1.1 The general situation of the news impacting on market prices (1) Events happen; (2) News report; (3) Investors read news articles; (4) Investors interpret information according to their own knowledge and take action; (5) Actions are converted into orders and reflected in the price changes of stocks. 2
Figure 2.1 (a) A separating hyperplane with small margin. (b) A separating hyperplane with larger margin. A better generalization capability is expected from (b). 8
Figure 2.2 A generalized optimal margin hyperplane. The two sets of circles and triangles are not linearly separable. The solid line is optimal margin hyperplane, the filled circles and triangles the support vectors (the margin vectors are shown in black, the errors in gray). 11
Figure 2.3 The SVM maps the input space into a high-dimensional feature space and then constructs an optimal hyperplane in the feature space. 12
Figure 2.4 Mapping the training data nonlinearly into a higher- dimensional feature space via , and construct a separating hyperplane with maximum margin. 13
Figure 3.1 System flow chart 22
Figure 3.2 Particle swarm a Optimization algorithm flow chart 32
Figure 4.1 Message from Twitch 51
Figure 4.2 Positive and negative appearing simultaneously 53
Figure 4.3 The system process. 58
Figure 4.4 Internet terms normalization—program framework 60
Figure 4.5 The emotional statistic charts for all the messaging, including (a) word cloud,(b)emotional fluctuation chart, and(c)emotional radar chart 83
Figure 4.6 The emotional statistic charts for all the users, including (a) histogram (b)box-and-whisker plot 84
References
[1]Abe, S. and Lan, M.S., 1995. A method for fuzzy rules extraction directly from numerical data and its application to pattern classification. IEEE Transactions on Fuzzy Systems, 3(1), 353-361.
[2]Amari, S., Murata, N., Müller, K.R., Ginke, M., and Yang, H., 1996. Statistical theory of overtraining-Is cross-validation asymptotically effective?” Advances in Neural Information Processing Systems. 8, 176-182, Cambrodge, MA: MIT Press.
[3]Bahrepour, M., Akbarzadeh, T.M.R., Yaghoobi, M., and Naghibi, S.M.B., 2011. An adaptive ordered fuzzy time series with application to FOREX. Expert Systems with Applications, 38, 475–485.
[4]Blei, D.M., Ng, A.Y., Jordan, M.I., 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
[5]Bollen, J., Mao, H., and Zeng, X., 2012. Twitter mood forecasts the stock market. Journal of Computational Science, 2, 1-8.
[6]Lin, C., 2015. Emotional analysis system using Google App reviews for word weight adjustment, Department of Information Management, Providence University, Master Thesis
[7]Cortes, C. and Vapnik, V., 1995. Support-vector networks. Machine Learning, 20, 273-297.
[8]Hongda, Q., 2010. Exploration of opinions in the application of Chinese film reviews, Institute of Information Science and Engineering, National Chiao Tung University, Master Thesis
[9]Ning, G., and Lai, K., 2010. Analysis of Emotional Opinions Based on Travel Experience and Corresponding Situations of Internet Community. Paper Published at the 2nd Conference on Natural Language and Speech Processing, Nantou County
[10]Huang, J., 2012. The establishment of a Chinese-language version of the Chinese Dictionary of Words and Computations. Chinese Journal of Psychology, 54, 185-201.
[11]Yang, H., Huang, Y., and Lin, Q., 2015. Internet-based terminology translation system based on decision tree and binary language model. Journal of Electronic Commerce, 17, 25-48.
[12]Liao, W., 2015. Sentimental analysis and trend development of popular Chinese pop music lyrics. Practice Department of Information Technology and Management, Master's thesis.
[13]Feldman, R., 2013. Techniques and applications for sentiment analysis. Communications of the ACM, 56, 82-89.
[14]Liu, B., 2012. Sentiment analysis and opinion mining. Synthesis Lectures on Human Language Technologies, 5, 1-167.
[15]Fama, E.F., 1991. Efficient capital markets: II. The Journal of Finance, 46(5), 1575–1617.
[16]Fung, G., Yu, J., and Lam, W., 2002. News sensitive stock trend forecastion. Advanced Knowledge Discovery and Data Mining, 481-493.
[17]Hagenau, M., Liebmann, M., and Neumann, D, 2013. Automated news reading: Stock price forecastion based on financial news using context-capturing features. Decision Support Systems, 55, 685–697
[18]Hao, P.Y., 2016. Support vector classification with fuzzy hyperplane. Journal of Intelligent & Fuzzy Systems, 30(3), 1431-1443.
[19]Kaiser, K., and Miksch, S., 2005. Information extraction - a survey. Tech. Rep. Asgaard-TR-2005-6,Vienna University of Technology, Institute of Software Technology and Interactive Systems,Vienna.
[20]Kang, D., and Park Y., 2013. Review-based measurement of customer satisfaction in mobile service: sentiment analysis and VIKOR approach. Expert System with Application, 35-45.
[21]Ku, L.W. and Chen, H.H., 2007. Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58, 1838-1850.
[22]Kennedy, J., Eberhart, R., 1995. Particle swarm optimization”, 1995. Proceedings. IEEE International Conference on Neural Networks, Australia, November, 1942-1948.
[23]Kennedy, J.R., and Eberhart, C., 1997. A discrete binary version of the particle swarm algorithm. Systems, Man, and Cybernetics, 1997. Computational Cybernetics and Simulation, 1997 IEEE International Conference on, Orlando USA October 12, 4104-4108.
[24]Li, Y.M., and Li, T.Y., Deriving market intelligence from microblogs. Decision. Support System, 55, 206-217.
[25]Liu, J.S. and Cheng, Y.W., 2006. 以字串特徵做為文本資料之錯誤偵測 (Textual Data Error Detection based on String Characteristics) [In Chinese],in ROCLING.
[26]Liu, B., and Zhang, L., 2012. A survey of opinion mining and sentiment analysis. In Mining text data (pp. 415-463). Springer.
[27]LeBaron, W.B., and Arthur, 1999. Time series properties of an artificial stock market. Journal of Economic Dynamics and Control, 23(9-10), 1487-1516.
[28]Leigh, W., Purvis, R., and Ragusa, J.M., 2002. Forecasting the NYSE composite index with technical analysis, pattern recognizer, neural network, and genetic algorithm: a case study in romantic decision support. Decision Support Systems, 32, 361-377.
[29]Li, F., 2010. Textual analysis of corporate disclosures: a survey of the literature. Journal of Accounting Literature, 29, 143-165.
[30]Li, Y.M., and Li, T.-Y., 2013. Deriving market intelligence from microblogs. Decision Support System, 55, 206-217.
[31]Li, X., Xie, H., and Chen, L., 2014. News impact on stock price return via sentiment analysis, Knowledge-Based Systems, 69, 14-23.
[32]Lu, C.J., Lee, T.S., and Chiu, C.C., 2009. Financial time series forecasting using independent component analysis and support vector regression. Decision Support Systems, 47, 115-125.
[33]Mabu, S., Hirasawa, K., Obayashi, M., and Kuremoto, T., 2013. Enhanced decision making mechanism of rule-based genetic network programming for creating stock trading signals. Expert Systems with Applications, 40, 6311-6320.
[34]Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., and Ngo, D.C.L., 2014. Text mining for market forecastion: a systematic review. Expert Systems with Applications, 41(16), 7653-7670.
[35]Nizer, P.S.M., and Nievola, J.C., 2012. Forecasting published news effect in the Brazilian stock market. Expert Systems with Applications, 39, 10674–10680
[36]Nikfarjam, A., Sarker, A, O’Connor, K., Ginn, R., Gonzalez, G., 2014. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster characteristics. Journal of American Medicine Information Association, 22(2), 35-45.
[37]Premanode, B., and Toumazou, C., 2013. Improving forecastion of exchange rates using differential EMD. Expert Systems with Applications, 40, 377-384.
[38]Ravi, K., and Ravi, V., 2015. A survey on opinion mining and sentiment analysis: Tasks, approaches and applications. Knowledge-Based Systems, 89, 14-46.
[39]Rui, H., Liu, Y., and Whinston, A., 2013. Whose and what chatter matters? The effect of tweets on movie sales, Decision Support System, 55, 863-870.
[40]Ryan, P., Taffler, R.J., 2004, Are economically significant stock returns and trading volumes driven by firm specific news releases? Journal of Business Finance & Accounting, 31, 49-82.
[41]Schumaker, R.P., and Chen, H., 2016. Textual analysis of stock market forecastion using breaking financial news: the AZFin text system. ACM Transactionson Information Systems, 27(2), 238-245.
[42]Schumaker, R.P., Zhang, Y., Huang, C.N., Chen H., 2012. Evaluating sentiment in financial news articles. Decision Support Systems, 53, 458-464.
[43]Sermpinis, G., Laws, J., Karathanasopoulos, A., and Dunis, C.L., 2012. Forecasting and trading the EUR/USD exchange rate with gene expression and psi sigma neural networks. Expert Systems with Applications, 39, 8865-8877.
[44]Si, J., Mukherjee, A., Liu, B., Li, Q., Li, H., and Deng, X., 2013. Exploiting topic based twitter sentiment for stock forecastion. In Proceedings of the 51st annual meeting of the association for computational linguistics, volume 2: short papers (pp. 24–29). The Association for Computer Linguistics
[45]Shi, Y., and Eberhart, R.C., 1998. A modified particle swarm optimizer”, Proc. of the IEEE Congress on Evolutionary Computation. IEEE Service Center, Anchorage USA, May, 69-73.
[46]Tetlock, P.C., Saar-Tsechansky, M., and Macskassy, S., 2008. More than words: Quantifying language to measure firms’ fundamentals. Journal of Finance, 63(3), 1437-1467.
[47]Tetlock, P.C., 2011. All the news that's fit to reprint: do investors react to stale information? The Review of Financial Studies, 24(5), 1481-1512.
[48]Vapnik V.N., The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.
[49]Vu, T.T., Chang, S., Ha, Q.T., & Collier, N., 2012. An experiment in integrating sentiment features for tech stock forecastion in Twitter. In 24th international conference on computational linguistics, 23-38.
[50]Walczak, S., 2001. An empirical analysis of data requirements for financial forecasting with neural networks. Journal of Management Information Systems, 17(4), 203–222.
[51]Wu, G., Fung, J., Yu, Q., and Pan, 2009. Stock forecastion: an event-driven approach based on bursty keywords. Frontiers Computational Science China, 3, 145-157.
[52]Yu, L.C., Wu, J.L., Chang, P.C., and Chu, H.S., 2013. Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news. Knowledge-Based Systems, 41, 89–97.
[53]Yu, H., Nartea, G.V., Gan, C., and Yao, L. J., 2013. Forecastive ability and profitability of simple technical trading rules: Recent evidence from Southeast Asian stock markets. International Review of Economics and Finance, 25, 356–371.
Zadeh, L.A., 1965. Fuzzy sets. Information and Control, 8, 338-353.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊