研究生(外文):Lai, Cheng-Yi
論文名稱(外文):On the Construction and Analysis of Time-Series-Oriented Lexicons
外文關鍵詞:Financial lexiconTime seriesStock movementprediction
本論文提出一個建立時間序列導向財經字典的框架。這部字典可以涵蓋不同型態的資源並且與預測問題的目標有明確的關係。在這個框架中的輸入是由文字資訊以及財經時間序列所組成。文字資訊像是財經新聞而財經時間序列像是一間公司的股價資訊。接著我們使用皮爾遜積差相關係數(Pearson product-momen correlation coefficient)來計算每個文字頻率時間序列以及一間公司的股票價格時間序列相關程度。使用皮爾遜相關係數來計算兩個時間序列的相關程度雖然是一個不錯的方法,但是當其中一個時間序列被延伸或是位移時,他的效果有其極限在。為了克服這個極限,我們採用動態時間校正(Dynamic Time Warping) 來解決這個問題。最後我們就能得所有與股價時間序列高相關的字來建立時間序列導向財務字典。此外我們利用所建立的字典來學習並建立一個簡單的股票走向預測模型。實驗結果顯示這些高相關的字普遍的擁有優良的預測能力,這證明通過其歷史股票價格捕捉一個公司的關鍵字提出這個想法的可行性。
This thesis proposes a novel framework to build a time-series-oriented lexicon which can cover different types of sources and also has explicit links with the targets of prediction problems. In the framework, the input is composed of a text stream, such as financial news and a financial time series, such as the stock prices of a company. We then calculate the Pearson correlation between the frequency series of each word and the stock price series of a company. Although Pearson correlation gives a good idea of how much the two time series are correlated, it has a limitation in capturing the similarity when one of the series is stretched or shifted. To overcome this limitation, we adopt Dynamic time warping (DTW) to handle the problem. Finally, the words with high correlations will be extracted to build the time-series-oriented lexicon. Additionally, we adopt the learned lexicon to construct a model for stock price movement prediction. The experimental results demonstrate that the learned words generally have good prediction ability, which attests the practicability of the proposed idea of capturing a company's keywords via its historical stock prices.
1 Introduction 1

2 Related Work 3
2.1 Soft and Hard Information . . . . . . . . . . . 3
2.2 Text Mining in Finance . . . . . . . . . . . . 4
2.3 Incorperating Hard Information into Text Mining . 4

3 Methodology 6
3.1 The Proposed Framework . . . . . . . . . .. . 6
3.2 Pearson Product-Moment Correlation Coefficient . . 7
3.3 Dynamic Time Warping (DTW) . . . . . . . .. . 9

4 Experiments on Time-Series-Oriented Lexicon Construction 12
4.1 Dataset . . . . . . . . . . . . . . . . . . . . 12
4.2 Data Preprocessing . . . . . . . . . . . . . . . .. . . . 13
4.2.1 Text Indexing . . . . . . . . . . . . . . . . . . . . . . 13
4.2.2 Dealing with Missing Data . . . . . . . . . . 13
4.3 Experimental Results: The Resulting Lexicons . . 14

5 Stock Price Movement Prediction via the Learned Lexicons 19
5.1 Dataset . . . . . . . . . . . .. . . . . . . . . 19
5.2 Experimental Setting . . . . . . . . . . . . . . 19
5.2.1 Feature Representation . . . . . . . . . . . 19
5.2.2 Parameter Setting . . . . . . . . . . . . . . . . . . . . . . 20
5.3 Experimental Results . . . . . . .. . . .. . . . 20

6 Conclusion and Future work 23
