跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.81) 您好!臺灣時間:2025/10/05 05:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉品汝
研究生(外文):LIU, PINRU
論文名稱:以 Spark 之文字探勘工具探討財經新聞標題與股市波動度之關聯
論文名稱(外文):Exploring the Relationship between Financial News Headlines and Stock Market Volatility using the Spark's Text Mining Tool
指導教授:張揖平張揖平引用關係
指導教授(外文):CHANG, YI-PING
口試委員:李沃牆吳錦全
口試委員(外文):LEE, WO-CHIANGWU, CHIN-CHUAN
口試日期:2017-06-16
學位類別:碩士
校院名稱:東吳大學
系所名稱:財務工程與精算數學系
學門:數學及統計學門
學類:其他數學及統計學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:50
中文關鍵詞:波動度Spark文字探勘機器學習
外文關鍵詞:volatilitySparktext miningmachine learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1474
  • 評分評分:
  • 下載下載:16
  • 收藏至我的研究室書目清單書目收藏:3
本文主要以文字探勘 (text mining) 和機器學習 (machine learning) 技術為基礎,探討某報之財經新聞標題與台灣加權股價指數波動之關聯。由於財經新聞標題經由文字探勘轉換,具有特徵 (features) 維度大的特性,而 Spark 在處理資料上是使用記憶體內運算 (in-memory computing) 技術,使得執行具有大數據資料之程式時,其計算速度能比 Hadoop MapReduce 快很多,所以本文採用 Spark 為分析工具。本文資料來源為 2010 年至 2016 年某報每日財經新聞標題及台灣經濟新報 (taiwan economic journal) 提供之每日台灣加權股價指數資料,經過計算由加權股價指數轉換為實質波動度 (realized volatility),並將實質波動度值分為兩類。此外,利用 Spark 之文字探勘工具將財經新聞標題轉換為特徵,最後以 Spark 之機器學習工具建立財經新聞標題對實質波動度分類 (classification) 的預測模型,並且評估預測模型的表現。
The paper explores the relationship between headlines of financial news and TAIEX volatility based on the technology of text mining and machine learning. In fact, there is a feature due to the conversion from the headlines of financial news to be a huge dimension by the technology of text mining. Spark owns the characteristic of in-memory computing, and that is the reason why Spark has a higher speed than Hadoop MapReduce when handling big data, so this paper uses Spark as an analysis tool. The paper is based on two subjects, one is the daily headlines of financial news in a newspaper from 2010 to 2016, and another is the daily TAIEX from TEJ. There are two categories through converting from TAIEX to real volatility. In addition, the headlines of financial news are transformed to features by Spark’s text mining tool. Lastly, go establishing a forecasting model between the headlines of financial news and the classification of real volatility by Spark’s machine learning tool, and then go evaluating the performance of the forecasting model.
目錄
圖目錄
表目錄
1. 研究動機
2. 研究方法
2.1 Jieba中文分詞
2.2 Spark 介紹
2.3 Spark 資料處理與機器學習工具
2.4 Spark 之文字探勘工具 TF-IDF 與 Word2Vec
2.5 Spark 之機器學習工具決策樹
2.6 交叉驗證、Accuracy 和 AUC
3. 研究結果
4. 結論
參考文獻
附錄 1. Spark ML Pipeline 之流程
附錄 2. Spark 與 scikit-learn 之 TF-IDF 公式比較
附錄 3. TF-IDF 之低頻詞參數比較
附錄 4. TF-IDF 與 Word2Vec 例子說明
附錄 5. 決策樹之屬性選擇指標例子說明


林志傑 (2014). 如何使用 Jieba 結巴中文分詞程式, http://blog.fukuball.com/ru-he-shi-yong-jieba-jie-ba-zhong-wen-fen-ci-cheng-shi/
Aggarwal, C. C., (2015). Data Classification : Algorithms and Applications. CRC Press, New York.
Beeley, C., (2016). Web Application Development with R using Shiny. Second Edition. Packt Publishing, UK.
Berry, M.W., (2004). Survey of Text Mining : Clustering, Classification, and Retrieval. Springer, New York.
Breiman, L., J. Friedman, C. J. Stone, and R. A. Olshen, (1984). Classification and regression trees. CRC press, New York.
Fawcett, T., (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27, 861-874.
Grąbczewski, K., (2014). Meta-Learning in Decision Tree Induction. Springer, New York.
Gullen, M., (2016). Big Data Analytics with Spark : A Practitioner's Guide to Using Spark for Large Scale Data Analysis. Apress, New York.
Han, J., and M. Kamber, (2006). Data Mining Concept and Techniques. Second Edition, Morgan Kaufmann, San Francisco.
Hastie, T., R. Tibshirani, and J. Friedman, (2009). The Elements of Statistical Learning : Data Mining, Inference, and Prediction. Second Edition. Springer, New York.
Hofmann, M. and A. Chisholm, (2016). Text Mining and Visualization : Case Studies Using Open-Source Tools. CRC Press, New York.
Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. Applied statistics, 119-127.
Kirk, M., (2017). Thoughtful Machine Learning: A Testing Data-Driven Approach. O’Reilly Media, Inc., Beijing.
LazyProgrammer, (2016). Big Data, MapReduce, Hadoop, and Spark with Python. https://lazyprogrammer.me
Mikolov, T., K. Chen, G. Corrado, and J. Dean, (2013). Efficient estimation of word representations in vector space \https://arxiv.org/pdf/1301.3781.pdf
Mikolov, T., W. Yih, and G. Zweig, (2013). Linguistic regularities in continuous space word representations Proceedings of NAACL-HLT. 13, 746-751. http://www.aclweb.org/anthology/N13-1#page=784
Müller, A. C. and S. Guido, (2016). Introduction to Machine Learning with Python : A Guide for Data Scientists. O’Reilly Media, Inc., Beijing.
Nassirtoussi, A. K., S. Aghabozorgi, T. Y. Wah, and D. C. L. Ngo, (2015). Text mining of news-headlines for FOREX market prediction : A multi-layer dimension reduction algorithm with semantics and sentiment. Expert Systems with Applications, 42(1), 306-324.
Pentreath, N., (2015). Machine Learning with Spark : Tackle Big Data with Powerful Spark Machine Learning Algorithms. Packt Publishing, UK.
Ramakrishnan, N., (2009). C4.5. In The Top Ten Algorithms in Data Mining. Edited by X. Wu and V. Kumar, 1-19. CRC Press, New York.
Raschka, S., (2015). Python Machine Learning. Packt Publishing, UK.
Resnizky, H.G., (2015). Learning Shiny. Packt Publishing, UK.
Srivastava, A. and M. Sahami, (2009). Text Mining: Classification, Clustering, and Applications. CRC Press, New York.
Triacca, U. and F. Focker, (2014). Estimating overnight volatility of asset returns by using the generalized dynamic factor model approach, Decisions in Economics and Finance, 37(2), 235-254.
Wu, X., V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G.J. McLachlan, A. g, B. Liu, P.S. Yu, Z.H. Zhou, M. Steinbach, D.J. Hand and D. Steinberg, (2008). Top 0 Algorithms in Data Mining, Knowledge and Information Systems, 14(1), 1-37.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊