(3.231.29.122) 您好!臺灣時間:2021/02/25 22:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:柯志杰
研究生(外文):Chih-Chieh Ko
論文名稱:從全球資訊網語料自動萃取中文詞彙定義之系統的研究
論文名稱(外文):An Automated Term Definition Extraction System using the Web Corpus in Chinese Language
指導教授:呂芳懌呂芳懌引用關係
指導教授(外文):Fang-Yie Leu
學位類別:碩士
校院名稱:東海大學
系所名稱:資訊工程與科學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:45
中文關鍵詞:定義全球資訊網資訊萃取中文文字探勘知識系統
外文關鍵詞:DefinitionsWeb corpusInformation ExtractionChinese LanguageText MiningKnowledge System
相關次數:
  • 被引用被引用:0
  • 點閱點閱:295
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究提出一個系統稱為DefExplorer,它可以透過全球資訊網自動收集及萃取一個中文詞彙的定義,並自動識別詞彙的類型及去除雜訊。DefExplorer使用語意方式過濾不理想的資訊。本研究提出兩種候選集合,分別是「一般候選集合」與「領域候選集合」,並依候選句的相似度對眾多的定義候選句進行分群,及估算候選句的重要性,以供最後解答的選取。實驗顯示DefExplorer能夠有效從全球資訊網萃取詞彙的定義,尤其是一般字典沒有收錄的專有名詞,效果尤佳。
This paper proposes a system, named DefExplorer, which extracts term definitions from the Web, determines the type of Chinese question terms, and selects answers from noisy Web pages automatically. DefExplorer filters out invalid data with a semantic approach. We deployed two types of candidate sets, common and domain specific, to group similar candidates and determine candidates’ importance for selecting final answers. Experimental results show that DefExplorer can effectively extract term definitions from the Web, especially for the definitions of out-of-vocabulary terms.
Chapter 1 Introduction 1
Chapter 2 Related Work 3
Chapter 3 Extracting Term Definitions 4
3.1 Question Analysis 5
3.2 Document Retrieval 5
3.3 Semantics Selection 7
3.4 Similarity Scoring 9
3.5 Candidate Grouping 10
3.6 Answer Generation 12
Chapter 4 Experimental Results 14
4.1 Grouping Threshold 15
4.2 Performance of Dynamic Way 17
4.3 Performance of Static Way 20
4.4 Integrating with Existing Lexicon 21
4.5 Performance on Two Sorted Bases 22
4.6 Comparison with Other Systems 23
Chapter 5 Conclusions and Future Research 26
References 27
Appendix A Definition Sentence Patterns 30
Appendix B Question Term Sets for Experiments 31
Appendix C Examples of an Extracting Process 35
Appendix D Examples of Extracting Results and Its Outputs 37
[1] E. M. Voorhees. Overview of the TREC 2003 question answering track. In Proceedings of the 12th Text Retrieval Conference (TREC-2003), pages 54–68, 2003.
[2] E. M. Voorhees. Overview of the TREC 2004 question answering track. In Proceedings of the 13th Text Retrieval Conference (TREC-2004), 2004.
[3] Prager, J., Chu-Carroll, J., Czuba, K., Welty, C., Ittycheriah, A., and Mahindru, R. IBM’s PIQUANT in TREC2003. In Proceedings of the 12th Text Retrieval Conference (TREC-2003), pages 283-292, 2003.
[4] S. Blair-Goldensohn, K. R. McKeown, and A. H. Schlaikjer. A hybrid approach for QA track definitional questions. In Proceedings of the 12th Text Retrieval Conference (TREC-2003), pages 185–192, 2003.
[5] Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim., 2006. Probabilistic Model for Definitional Question Answering. In 29th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval (SIGIR ’06), pages 212-219, 2006.
[6] Wen-Hsiang Lu, Lee-Feng Chien, and Hsi-Jian Lee, 2004. Anchor Text Mining for Translation of Web Queries: A Transitive Translation Approach. In ACM Transactions on Information Systems, Vol. 22, No. 2, pages 242-269, 2004.
[7] Ying Zhang and Phil Vines. Using the Web for Automated Translation Extraction in Cross-Language Information Retrieval. In 27th Annual International ACM SIGIR Conference (SIGIR ’04), pages 162-169, 2004.
[8] Wai Lam, Pik-Shan Cheung and Ruizhang Huang. Mining Events and New Name Translations from Online Daily News. In Joint Conference on Digital Libraries (JCDL) 2004, pages 287-295, 2004.
[9] Zhigong Zhang, 1986. Han yu yu fa chang shi. Joint Publishing, Hong Kong, pages 45-49, 1999.
[10] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein. Introduction to Algorithms, Second Edition. The MIT Press, pages 350-355, 2001.
[11] S. Yu, X. Zhu, H. Wang, Y. Zhang. The Grammatical Knowledge-base of Contemporary Chinese — a complete specification. Beijing: Tsinghua University Press, 2003.
[12] TF-IDF. Wikipeida. http://en.wikipedia.org/wiki/Tf%E2%80%93idf
[13] Ministry of Education, R.O.C. Chinese Dictionary. http://140.111.34.46/newDict/dict/
[14] Zhengdong Dong, Qiang Dong. HowNet, http://www.keenage.com 1999
[15] http://blog.pixnet.net/cwyuni/post/849491

[16] Yuen Ren Chao, 1968. Translated by Pang-hsin Ting. A Grammar of Spoken Chinese, Chinese edition. The Chinese University of Honk Kong, pages 40-41, 359-362, 2002.
[17] Kwok-ching Chow. The Emergence of the Copula “Shi” from the Perspective of Information Structure. In Humanitas Taiwanica, 2007.
[18] Yung-Chieh Sung. The research on improving the performance of information retrieval with the Agglomerative Nesting (AGNES) algorithm — using a Chinese news dataset. 2007.
[19] Yuen-Hsien Tseng. Automatic Thesaurus Generation for Chinese Documents. In Journal of The American Society for Information Science and Technology, Vol. 53, pages 1130-1138, 2002.
[20] http://zh.wikipedia.org/wiki/Wikipedia:繁簡分歧詞表
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊
 
系統版面圖檔 系統版面圖檔