跳到主要內容

臺灣博碩士論文加值系統

(98.84.25.165) 您好!臺灣時間:2024/11/14 19:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:洪大為
研究生(外文):Ta-Wei Hung
論文名稱:以事件框架粹取為基礎的中文財經新聞標題相似度計算
論文名稱(外文):Semantic Similarity Measurement of Chinese Financial News Titles Based on Event Frame Extracting
指導教授:李漢銘李漢銘引用關係何建明何建明引用關係
指導教授(外文):Hahn-Ming LeeJan-Ming Ho
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:資訊工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:63
中文關鍵詞:語義相似度框架資訊粹取
外文關鍵詞:semantic similarityframeinformation retrieval
相關次數:
  • 被引用被引用:0
  • 點閱點閱:439
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
語義上的相似度計算在資訊粹取與資訊整合中是一個很重要的研究領域。因為我們關心的文件有著不同的性質,對應的語義相似度計算應要隨之改變。在財經領域中,新聞標題呈現了內文中大部分的資訊。同時,不正確的訊息很少會出現在標題上。因此計算標題間語義相似度可以取代計算本文標題相似度以減少計算時間以及關鍵字的數量。當文件僅由少數關鍵字組成時,比如說以財經新聞標題當作文件,計算語義相似度的困難處在於不同的文件常會依據他們的關鍵字集合變成類似的向量。基於這個理由,在這篇論文我們提供了一個命名為事件框架(Event Frame)的類似框架的結構來紀錄中文財經新聞標題為了包含標題的分類資訊以及利用事件框架來計算兩則中文財經新聞間的關係。
  這個中文財經新聞標題的語義相似度計算是基於建立事件框架當作中文財經新聞的樣板。然後再利用一個語義相似度計算式去整合兩則新聞標題中事件框架間關係與關鍵字間關係。它考慮到兩則新聞標題中基本意思間的關係,而且比起比較每個關鍵字配對能減少關鍵字比對的次數。這個方法的結果顯示事件框架的粹取擁有近似人工的高準確度以及提供的語義相似度計算強調新聞標題中基本意思間的關係更勝於關鍵字間關係。此外,我們提供的語義相似度計算粹取關鍵字間的資訊因為有時候人會認為兩則新聞標題是類似的只要他們的關鍵字集合擁有大交集。因此,我們可以利用以事件框架為基礎的相似度計算法來找出描述相同事件的財經新聞標題。
Semantic similarity measurement is an important research domain in information retrieval and information integration. Since the semantic similarity’s object documents, which we care about, have different properties, the corresponding semantic similarity measurement should make changes in the information integration concept. In the financial domain, the news titles show most information of total news articles. At the same time, the messages which are not the facts would not be shown in the news titles frequently. Therefore the semantic similarity measurement of financial news titles can be used to replace the semantic similarity measurement of whole financial news articles in order to improve the time consuming and reduce the number of keywords. When the documents are composed by a few amount of keywords, i.e. the financial news titles, the difficulty to measure semantic similarity of these documents is that the different documents makes the resemble vector by their keyword sets. Based on the reason, we provide a frame-like structure, named “Event Frame”, to archive the Chinese financial news titles in order to include the classification information of the financial news titles and compute the relation between two Chinese financial news titles based on the Event Frame structure in this thesis.
This semantic similarity measurement of the Chinese financial news titles is based on constructing the Event Frame structure as the template of a Chinese financial news title. A semantic similarity function is used to integrate both the relation of Event Frames of the financial news titles and the relation between the keywords between the keywords of these titles. It concerns the relation between the basic meanings of two news titles and reduces the comparing time. The result of this approach shows that the Event Frame extracting has high precision as man-made and the provided semantic similarity measurement emphasizes the relation between the basic meanings of two news titles rather than the relation of keywords. Besides, the proposed similarity measurement retrieves the information of keywords since sometimes humans think that two news titles are similar only if the intersection of keyword sets in two news titles is large. Therefore, we can differentiate the Chinese financial news titles which mention the same event from all the Chinese financial news titles by the semantic similarity measurement based on Event Frame extracting.
CHAPTER 1 INTRODUCTION 1
1.1 SEMANTIC SIMILARITY 1
1.2 MOTIVATION 2
1.3 OUTLINE 4
CHAPTER 2 BACKGROUND 5
2.1 SEMANTIC SIMILARITY MEASUREMENT 7
2.2 AN OVERVIEW OF HOWNET 8
2.3 FRAME STRUCTURE 11
2.4 THE BERKELEY FRAMENET PROJECT 14
CHAPTER 3 SYSTEM ARCHITECTURE 18
3.1 EVENT FRAME CONCEPT 19
3.2 EVENT FRAME EXTRACTING 22
3.3 SIMILARITY MEASUREMENT BASED ON EVENT FRAME 27
3.3.1 Related Event Names 29
3.3.2 Event Slot pairs finding 31
3.3.3 Similarity of the values in two Event Slots 31
3.3.4 Term overlapping 32
CHAPTER 4 EXPERIMENT AND DISCUSSION 40
4.1 DATASET DESCRIPTION 41
4.2 THE ACCURACY OF EVENT FRAME EXTRACTING 43
4.3 STRATEGY FOR PARAMETER DETERMINATION 48
4.4 THE INFLUENCE OF TERM OVERLAPPING DETECTION 51
CHAPTER 5 53
5.1 CONCLUSION 53
5.2 FUTURE WORK 55
REFERENCES 57
[1] Lee, D.L.; Huei Chuang; Seamons, K.; “Document ranking and the vector-space model,” Software, IEEE , Volume: 14 , Issue: 2 , Mar/Apr 1997, Pages:67 — 75.
[2] Jae-Jin Kim; Bon-Woo Hwang; Seong-Whan Lee; “Retrieval of the top N matches with support vector machines,” in Proceedings of 15th International Conference on Pattern Recognition, 2000. , Volume: 2, 3-7 Sept 2000, Pages:716 — 719.
[3] Haveliwala, T.H.; “Topic-sensitive PageRank: a context-sensitive ranking algorithm for Web search,” IEEE Transactions on Knowledge and Data Engineering, Volume: 15 , Issue: 4, July-Aug. 2003, Pages:784 — 796.
[4] T.; Hasegawa, M.; Shitaoka, K.; Kitade, T.; Nanjo, H.; “Automatic Indexing of Lecture Presentations Using Unsupervised Learning of Presumed Discourse Markers, Kawahara,” IEEE Transactions on Speech and Audio Processing, Volume: 12 , Issue: 4 , July 2004, Pages:409 — 419.
[5] Gerard Salton, Christopher Buckley, “Term-weighting approaches in automatic text retrieval” Information Processing and Management: an International Journal, Volume:24 n.5, p.513-523,1988.
[6] J.B. Lowe; C.F. Baker; and C.J. Fillmore; “A frame-semantic approach to semantic annotation,” in Proceedings of the SIGLEX workshop "Tagging Text with Lexical Semantics: Why, What, and How?" held April 4─5, 1997.
[7] Song, W.W.; Cheung, D.; Tan, C.J.; “A semantic similarity approach to electronic document modeling and integration,” in Proceedings of the First International Conference on Web Information Systems Engineering, 2000 , Volume: 1 , 19-21 June 2000, Pages: 116 — 124.
[8] Li, Y.; Bandar, Z.A.; Mclean, D.; “An approach for measuring semantic similarity between words using multiple information sources,” IEEE Transactions on Knowledge and Data Engineering, Volume: 15 , Issue: 4 , July-Aug. 2003, Pages: 871 — 882.
[9] Green, S.J.; “Building hypertext links by computing semantic similarity,” IEEE Transactions on Knowledge and Data Engineering , Volume: 11 , Issue: 5 , Sept.-Oct. 1999, Pages: 713 — 730.
[10] Heng-Hsou Chang; Yau-Hwang Ko; Jang-Pong Hsu; “An event-driven and ontology-based approach for the delivery and information extraction of e-mails,” in Proceedings of the International Symposium on Multimedia Software Engineering, 2000, 11-13 Dec. 2000, Pages: 103 — 109.
[11] Rodriguez, M.A.; Egenhofer, M.J.; “Determining semantic similarity among entity classes from different ontologies,” IEEE Transactions on Knowledge and Data Engineering, Volume: 15 , Issue: 2 , March-April 2003, Pages: 442 — 456.
[12] Merkl, D.; Tjoa, A.M.; Kappel, G.; “Learning the semantic similarity of reusable software components,” in Proceedings of the Third International Conference on Software Reuse: Advances in Software Reusability, 1994, 1-4 Nov. 1994, Pages: 33 — 41.
[13] Jiang, J.; Conrath, D.; “Multi-word complex concept retrieval via lexical semantic similarity,” in Proceedings of the International Conference on Information Intelligence and Systems, 1999, 31 Oct.-3 Nov. 1999, Pages: 407 — 414.
[14] Oleshchuk, V.; Pedersen, A.; “Ontology based semantic similarity comparison of documents,” in Proceedings of the 14th International Workshop on Database and Expert Systems Applications, 2003, 1-5 Sept. 2003, Pages: 735 — 738.
[15] Yi Guan; Xiao-Long Wang; Xiang-Yong Kong; Jian Zhao; “Quantifying semantic similarity of Chinese words from HowNet,” in Proceedings of the International Conference on Machine Learning and Cybernetics, 2002 , Volume: 1 , 4-5 Nov. 2002, Pages: 234 — 239.
[16] Chung-Hong Lee; Hsin-Chang Yang; “Text mining of bilingual parallel corpora with a measure of semantic similarity,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2001, Volume: 1 , 7-10 Oct. 2001, Pages: 470 — 475.
[17] Philip Resnik; “Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language,” Journal of Artificial Intelligence Research, vol.11, Pages: 95 — 130, 1999.
[18] A. Budanitsky and G. Hirst, “Semantic Distance in WordNet: An Experimental, Application-Oriented Evaluation of Five Measures,” in Proceedings. Workshop WordNet and Other Lexical Resources, Second Meeting North Am. Chapter Assoc. for Computational Linguistics, June 2001.
[19] Jung-Won Lee; Kiho Lee; Won Kim; “Preparations for semantics-based XML mining,” in Proceedings of the IEEE International Conference on Data Mining, 2001, 29 Nov.-2 Dec., Pages:345 — 352.
[20] P.P. Calado, B. Ribeiro-Neto, “An Information Retrieval Approach for Approximate Queries,” IEEE Transactions on Knowledge and Data Engineering, Volume: 15, pages:236-237, 2003.
[21] J.M. Griffiths, D.W. King, “US Information Retrieval System - Evolution and Evaluation,” Annals of the History of Computing, Volume: 24, pages:35-55, 2002.
[22] J.Y. Nie, F. Paradi, J. Vaucher, “Using Information Retrieval for Software Reuse,” in Proceedings of the 5th International Conference on Computing and Information, pages:448-452, 1993.
[23] S. Piramuthu, H.M. Chung, “Data Mining and Information Retrieval,” in Proceedings of the 36th Annual Hawaii International Conference on System Sciences, pages:841-842, 2002.
[24] L. Eikvil, “Information Extraction from World Wide Web — A Survey,” Technical Report 945, Norwegian Computing Center, 1999.
[25] A. Pannu, K. Sycara, “Learning text filtering preferences,” AAAI Symposium on Machine Learning and Information Access, 1996.
[26] B. Yuwono, D.L. Lee, “Search and Ranking Algorithms for Locating Resources on the World Wide Web,” in Proceedings of the 12th International Conference on Data Engineering, pages:164-171, 1996.
[27] M. Kobayashi, K. Takeda, “Information Retrieval on the Web: Selected Topics,” Technical Report, Tokyo Research Laboratory, IBM Japan, 1999.
[28] J. Kleinberg, “Authoritative Sources in A Hyperlinked Environment,” Journal of the ACM (JACM), Volume: 18 (5), pages:604-632, 1999.
[29] A. Sheth; “Changing Focus on Interoperability in Information Systems: From System, Syntax, Structure to Semantics,” Interoperating Geographic Information Systems, M. Goodchild, M. Egenhofer, R. Fegeas, and C. Kottman, eds., Pages: 5 — 30, 1999.
[30] Stevens, R.; Goble, C.; Horrocks, I.; Bechhofer, S.; “Building a bioinformatics ontology using OIL,” IEEE Transactions on Information Technology in Biomedicine, Volume: 6 , Issue: 2, June 2002, Pages:135 — 141.
[31] Fernandes, A.; de C Moura, A.M.; Porto, F.; “An ontology-based approach for organizing sharing, and querying knowledge objects on the Web,” in Proceedings of the 14th International Workshop on Database and Expert Systems Applications, 2003, 1-5 Sept. 2003, Pages:604 — 609.
[32] Castano, S.; Ferrara, A.; Montanelli, S.; Zucchelli, D.; “HELIOS: a general framework for ontology-based knowledge sharing and evolution in P2P systems,” in Proceedings of the 14th International Workshop on Database and Expert Systems Applications, 2003, 1-5 Sept. 2003, Pages:597 — 603.
[33] Alani, H.; Sanghee Kim; Millard, D.E.; Weal, M.J.; Hall, W.; Lewis, P.H.; Shadbolt, N.R.; “Automatic ontology-based knowledge extraction from Web documents,” IEEE Intelligent Systems, Volume: 18, Issue: 1, Jan.-Feb. 2003, Pages:14 — 21.
[34] Kok Wee Gan; Ping Wai Wong; “Annotating Information Structures in Chinese Texts Using HowNet,” in Proceedings of the NAACL-ANLP 2000 Workshop: Syntactic and Semantic Complexity in Natural Language Processing Systems, 2000.
[35] Minsky, M.; A framework for representing knowledge, In Brachman and Levesque, 1975.
[36] Kobayashi, H.; Ota, S.; “The semantic network of KANSEI words,” in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, 2000, Volume: 1, 8-11 Oct. 2000, Pages:690 — 694.
[37] Aamodt, A.; “A knowledge representation system for integration of general and case-specific knowledge,” in Proceedings., Sixth International Conference on Tools with Artificial Intelligence, 1994, 6-9 Nov., Pages:836 — 839.
[38] S. Abney and M. Light; “Hiding a Semantic Class Hierarchy in a Markov Model,” in Proceedings of the ACL Workshop Unsupervised Learning in Natural Language Processing, Pages: 1 — 8, 1999.
[39] R.K. Srihari, Z.F. Zhang, and A.B. Rao, “Intelligent Indexing and Semantic Retrieval of Multimodal Documents,” Information Retrieval, vol. 2, Pages: 245 — 275, 2000.
[40] Lien-Pharn Chien; Huang, R.Y.-M.; “Design of an efficient frame-based modeling and simulation tool,” AI, Simulation, and Planning in High Autonomy Systems, 1994.
[41] Ranasinghe, R.A.C.; Madurapperuma, A.P.; “Enhanced frame-based knowledge representation for an intelligent environment,” in Proceeding of the International Conference on Integration of Knowledge Intensive Multi-Agent Systems, 2003, 30 Sept.-4 Oct. 2003, Pages:25 — 30.
[42] A. Papadopoulos and Y. Manolopoulos; “Structure-based similarity search with graph histograms,” In Proceedings of the DEXA/IWOSS International Workshop on Similarity Search, IEEE Computer Society, pages 174 — 178, 1999.
[43] Blaschke, C.; Valencia, A.; “The frame-based module of the SUISEKI information extraction system,” IEEE Intelligent Systems, Volume: 17 , Issue: 2, March-April 2002, Pages:14 — 20.
[44] Soshnikov, D.; “An architecture of distributed frame hierarchy for knowledge sharing and reuse in computer networks,” in Proceedings of the IEEE International Conference on Artificial Intelligence Systems, 2002, 5-10 Sept. 2002, Pages:115 — 119.
[45] Atkins, Sue; Michael Rundell and Hiroaki Sato; “The Contribution of Framenet to Practical Lexicography,” International Journal of Lexicography, Volume 16.3, Pages: 333-357, 2003.
[46] Baker; Collin F. and Josef Ruppenhofer; “FrameNet''s Frames vs. Levin''s Verb Classes,” In J. Larson and M. Paster (Eds.) In Proceedings of the 28th Annual Meeting of the Berkeley Linguistics Society, Pages: 27-38, 2002.
[47] Baker, Collin F.; Fillmore Charles J.; Lowe, John B.; “The Berkeley FrameNet project,” In Proceedings of the COLING-ACL, 1998.
[48] Boas, Hans C.; “Frame Semantics as a framework for describing polysemy and syntactic structures of English and German motion verbs in contrastive computational lexicography,” In: Rayson, Paul, Andrew Wilson, Tony McEnery, Andrew Hardie, and Shereen Khoja (eds.), in Proceedings of the Corpus Linguistics 2001 conference on Technical Papers, Vol. 13. Lancaster, UK: University Centre for computer corpus research on language, 2001.
[49] HowNet, http://www.keenage.com/html/e_index.html.
[50] CKIP, http://godel.iis.sinica.edu.tw/CKIP/.
[51] Yahoo, http://www.kimo.com.tw/.
[52] BNC,http://www.natcorp.ox.ac.uk/.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top