跳到主要內容

臺灣博碩士論文加值系統

(44.192.38.248) 您好!臺灣時間:2022/11/26 23:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:賴哲霆
研究生(外文):Lai,Jhe-Ting
論文名稱:應用資訊擷取技術於企業評價財務項資料之取得
論文名稱(外文):An Application of Information Extraction in Collecting Financial Data for Business Valuation
指導教授:林我聰林我聰引用關係諶家蘭諶家蘭引用關係
指導教授(外文):Lin, Woo-TsongSeng, Jia-Lang
學位類別:碩士
校院名稱:國立政治大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:97
中文關鍵詞:資訊擷取企業評價財務項資料
外文關鍵詞:Information ExtractionBusiness ValuationFinancial Data
相關次數:
  • 被引用被引用:0
  • 點閱點閱:151
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
由於近幾年來網際網路電子資源的數量大量成長下,搜尋引擎技術的誕生為使用者帶來檢索資料文件上極高的便利與效率。但網路資源和使用者大量成長下,現有的關鍵字檢索技術已無法滿足使用者需求。然而「資訊擷取」就是將從檢索文件中擷取重要特定訊息或產生資訊間特定關係的一種技術。其不僅從文件中能過濾不必要的資訊,而且產生有興趣或特定的重要訊息和摘要。
企業評價即為一套收集、分析與應用財務或非財務資訊來評價企業的價值,其評估的結果可做為企業決策和無形資產買賣訂價之依據。目前在國內企業的財務報表、財務附註和財經新聞內容皆有與企業評價所需重要訊息和資料,並以網頁和PDF格式呈現。因此,本研究將對國內企業財務報表、財務附註和財經新聞為資料來源,以企業評價概念基礎下建立中文財務項資料的資訊擷取系統。從這些不同的異質資料來源中,擷取正確的財務項資料與其所對應之企業評價模型,以達成自動擷取企業評價資料。使用者能在最短的時間內取得相關有效評價資訊和學習評價模型,使資訊處理品質能夠提昇正確性和效率性。
Due to an increase in the wealth of electronic resources on the Internet in the past several years, the birth of the search engine has brought the utmost convenience and efficiency for users. However, searching for data by keyword retrieval techniques in information retrieval is not contented with some users’ specific demands due to a large number of network resources and users on the Internet. Information extraction (IE) is an improvement method which extracts the important specific event or produces specific relations among information from documents. IE can not only filter unnecessary information in any documents but also produce specific important messages and summaries that users are interested in.
Business valuation is collecting, analyzing, and applying to financial or non- financial integral information to appraise the business value. The evaluated results are used in the commerce pricing for the business decision and intangible assets. There are specific information and events about business valuation stored in the Chinese financial statements, notes to financial statements, and financial news of Taiwan’s companies at present and data is presented by the HTML and PDF files. Hence, we developed an information extraction system of Chinese financial data for business valuation from the domestic business financial statements, notes to financial statements, and financial news as our data sources. We extracted the correct financial data and their corresponding business valuation model to achieve an automatic extraction in the financial data from these different heterogeneous data sources. Users can collect the relevant valid valuation information and learn valuation models concepts within a very short time to improve accuracy and efficiency in information processing quality.
Table of Contents v
List of Tables vii
List of Figures ix
Chapter 1 Introduction 1
1.1 Research Background and Motivation 1
1.2 Research Objective 2
1.3 Research Scope 2
1.4 Research Issue 4
1.5 Research Flow 4
1.6 Organization of Thesis 6
Chapter 2 Literature Review 7
2.1 Chinese Information Extraction 7
2.2 Chinese Word Segmentation 8
2.2.1 Chinese Segmentation Methods 10
2.2.2 Unknown Word Extraction 14
2.2.3 Part-of-Speech Tagging Models 18
2.2.4 Segmentation Models Based on Named Entities 20
2.3 Information Extraction Methods 22
2.3.1 Keyword Extraction for Pure-Text Data 22
2.3.2 Structural Extraction for Tabular Data 23
2.4 Business Valuation Models 25
2.4.1 Income-Based Approach 26
2.4.2 Market-Based Approach 31
2.4.3 Asset-Based Approach 32
2.5 Summary 32
Chapter 3 Research Model 35
3.1 Information Extraction on Financial Statements 36
3.1.1 Financial Statements 36
3.1.2 Keyword Extraction on Financial Statements 38
3.2 Information Extraction on Notes to Financial Statements 41
3.2.1 Notes to Financial Statements 41
3.2.2 PDF Converting Processing 42
3.2.3 Keyword Extraction on Notes to Financial Statements 42
3.3 Information Extraction on Financial News 43
3.3.1 Financial News 43
3.3.2 Chinese Word Segmentation System Model 43
3.3.3 Chinese Keyword Analyzing Model on Financial Data 44
3.3.3.1 Account Name Analyzing 45
3.3.3.2 Organization Name Analyzing 46
3.3.3.3 Time Analyzing 49
3.3.3.4 Money and Percent Analyzing 51
3.3.4 Keyword Extraction on Financial News 54
3.4 Valuation Model Analyzing Based on Concept Hierarchy 55
3.5 Summary 60
Chapter 4 Prototype Development 61
4.1 Prototype Platform and Architecture 61
4.2 Prototype System Design 62
4.2.1 Web Crawler Design 62
4.2.2 Domain Lexicon Tool 63
4.2.3 PDF Converting Tool 65
4.2.4 Knowledgebase Design 65
4.2.5 Information Extraction System Function Design 67
Chapter 5 Research Experiment 69
5.1 Experiment Design 69
5.2 Experiment Evaluation 69
5.3 Experiment Results 70
5.3.1 Experiment I: Financial Statements 70
5.3.2 Experiment II: Notes to Financial Statements 75
5.3.3 Experiment III: Financial News 78
Chapter 6 Research Implication and Discussion 87
6.1 Managerial Findings and Implications 87
6.2 Technological Findings and Implications 88
Chapter 7 Conclusion and Future Work 89
7.1 Conclusion 89
7.2 Future Work 89
References 91
Appendix A 95
Appendix B 97
1.卜小蝶 (1996)。圖書資訊檢索技術。文華圖書館管理資訊股份有限公司。
2.中央研究院資訊科學所中文詞知識庫小組網站(Chinese Knowledge and Information Processing Group Website)。http://ckip.iis.sinica.edu.tw/CKIP
3.朱怡霖 (2002)。中文斷詞及專有名詞辨識之研究。國立台灣大學資訊工程研究所碩士論文,台北市。
4.吳岱儒 (2003)。財務管理。全華科技圖書股份有限公司。
5.吳啟銘 (2001)。企業評價:個案實證分析。智勝文化事業有限公司。
6.洪國賜、盧聯生 (2001)。財務報表分析。三民書局。
7.黃佳新 (2004)。關鍵字擷取與文件分類因子分析。國立清華大學工業工程與管理系碩士論文,新竹市。
8.黃燕萍 (1999)。中文社會新聞文件資訊擷取。國立雲林科技大學資訊管理系碩士論文,雲林縣。
9.葉政輝 (2002)。以語料為基礎的中文專有名詞的之研究。國立交通大學資訊科學所碩士論文,新竹市。
10.Atlam, El-S., Fuketa, M., Kashiji, S., Nakata, H., & Aoe, J. (2002). A new method for construction filed association terms using co-occurrence words and declinable words information. IEEE International Conference on Systems, Man and Cybernetics, 4, pp. 1217-1224.
11.Baeza-Yates, R. & Ribeiro-Neto, B. (1999). Modern information retrieval. Addision Wesley Longman Publishing Co. Inc.
12.Cercone, N., Huang, X., Peng, F., & Schurmans, D. (2003). Applying machine learning to text segmentation for information retrieval. Information Retrieval, 6(3), pp. 333-362.
13.Chen, A., Gey, F. C., He, J., Meggs, J., & Xu, L. (1997). Chinese Text Retrieval Without Using a Dictionary. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 42–49.
14.Chen, F. Y., Chen, K. J., Huang, C. R., & Tsai, P. F. (1999). Sinica treebank. Computational Linguistics and Chinese Language Processing, 4(2), pp. 87-104.
15.Chen, K. J. & Bai, M. H. (1998). Unknown word detection for Chinese by a corpus-based learning method. International Journal of Computational Linguistics and Chinese Language Processing, 3(1), pp. 27-44.
16.Chen, K. J. & Liu, S. H. (1992). Word identification for Mandarin Chinese sentences. Proceedings of the 14th Conference on Computational Linguistics, 1, pp. 101-107.
17.Chen, K. J. & Ma, W. Y. (2001). Construction and management for Chinese corpus. Proceedings of Research on Computational Linguistics Conference, pp.175-191.
18.Chen, K. J. & Ma, W. Y. (2002). Unknown word extraction for Chinese documents. Proceedings of the 19th International Conference on Computational Linguistics, 1, pp. 1-7.
19.Chen, K. J. & Ma, W. Y. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Proceedings of SIGHAN, pp. 31-38
20.Chen, K. J. & Ma, W. Y. (2005). Design of CKIP Chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3), pp. 235-249.
21.Chen, K. J. & Tsai, Y. F. (2003). Context-rule model for pos tagging. Proceedings of PACLIC 17, pp.146-151.
22.Chien, L. F. & Pu, H. T. (1996). Important issues on Chinese retrieval. Computational Linguistics and Chinese Language Processing, 1(1), pp.205-221.
23.Fu, G. & Luke, K. K. (2003). A two-stage statistical word segmentation system for Chinese. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, Association for Computational Linguistics, pp. 156-159.
24.Gao, J., Li, M., & Huang, C. N. (2003). Improve source-channel models for Chinese word segmentation. Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, 1(3), pp. 272-279.
25.Goldstein, R.C. & Storey, V.C. (1994). Materialization. IEEE Transactions on Knowledge and Data Engineering, 6(5), pp.835-842.
26.Han, J., Cai, Y. & Cercone N., (1993). Data-driven discovery of quantitative rules in relation databases, IEEE Transactions on Knowledge and Data Engineering, 5(1), pp. 29-40.
27.Hsieh, Y. M., Yang, D. C., & Chen, K. J. (2006). Improve parsing performance by self-learning. Proceedings of ROCLING XVIII, pp 63-76.
28.Krupl, B., Herzog, M., & Gatterbauer, W. (2005). Using visual cues for extraction of tabular data from arbitrary HTML documents. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 1000-1001.
29.Lee, R. C. T., Chang, R. C., Tseng, S. S. & Tsai, Y. T. (1999). Introduction to the design and analysis of algorithm (1). UNALIS Corp., pp. 419-423.
30.Li, W., Wong, K. F., & Yuan, C. (2003). A design of temporal event extraction from Chinese financial news. International Journal of Computer Processing of Oriental Languages, 16(1), pp. 21-39.
31.Liu, J., Nissim, D., & Thomas, J. (2002). Equity valuation using multiples. Journal of Account Research, 40(1).
32.Liu, T. & Wang, Z. (2005). Chinese unknown word identification based on local bi-gram model. International Journal of Computer Processing of Oriental Languages, 18(3), pp. 185-196.
33.Liu, Y., Mitra, P., Giles, C.L., & Bai, K.(2006). Automatic extraction of table metadata from digital documents. Digital Libraries,2006. JCDL’06. Proceedings of the 6th ACM/IEEE-CS Joint Conference on.
34.Lochovsky, F. H. & Wang, J. (2003). Data extraction and label assignment for Web database. Proceedings of the 12th International Conference on World Wide Web, pp. 187-196.
35.Maier D. (1978). The complexity of some problems on subsequences and supersequences. Journal of the ACM, 25(2), pp. 322-336.
36.Manning, C.D., Raghavan P., & Schutze, H. (2007). An introduction to information retriveal. Cambrige University Press Camvidge.England.
37.Nguyen, N. G., Hanny, Y. L. & Vo, T. T. (2005). An information extraction engine for Web discussion forums. Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp 978-979.
38.Peng, F., Huang, X., Schuurmans, D., & Cercone, N. (2002). Investigating the relationship between word segmentation performance and retrieval performance in Chinese IR. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 369-370.
39.Rosenfeld,B., Feldman, R., & Aumann, Y. (2002). Structural extraction from visual layout of documents. Proceedings of the eleventh international conference on Information and knowledge management, pp. 203-210.
40.Teahan, W.J., McNab, R., Wen, Y., & Witten, I. H. (2001). A compression-based algorithm for Chinese word segmentation. Computational Linguistics, 26(3), pp. 375–393.
41.Tseng, H. & Chen, K. J. (2002). Design of Chinese morphological analyzer. Proceeding of the First SIGHAN Workshop on Chinese Language Process, 18, pp. 1-7.
42.Wang, H. (2002). A study on noun sense disambiguation based on syntagmatic features. Computational Linguistics and Chinese Language Processing, 7(2), pp. 77-88.
43.Wong, K. & Xia, Y. (2005). An overview of temporal information extraction. International Journal of Computer Oriental Languages, 18(2), pp.137-152
44.You, J.M. & Chen, K.J. (2004). Automatic semantic role assignment for a tree structure. Proceedings of the 3rd SIGHAN Workshop on Chinese Language Processing, ACL-04, Barcelona.
45.Zhai, Y. & Liu, B.(2005). Web data extraction based on partial tree alignment. Proceedings of the 14th international conference on World Wide Web, pp.76-85.
46.Zhang, J., Gao, J., & Zhou, M. (2000). Extraction of Chinese compound words -An experimental study on a very large corpus. Proceedings of the Second Workshop on Chinese Language Processing: Held in Conjunction with the 38th Annual Meeting of the Association for Computational Linguistics, 12, pp. 132-139.
47.Zhou, G. & Su J. (2003). Chinese efficient analyser integrating word segmentation, Part-Of-Speech Tagging, Partial Parsing and Full ParsingParsing. Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, 17, pp. 78-83.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top