跳到主要內容

臺灣博碩士論文加值系統

(44.192.22.242) 您好!臺灣時間:2021/08/01 11:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:邱彥賓
研究生(外文):Yen-Pin Chiu
論文名稱:為加速知識圖譜建立之中文關連樣式探勘
論文名稱(外文):Chinese Relation Patterns Mining for Knowledge Base Acceleration
指導教授:陳信希陳信希引用關係
口試委員:鄭卜壬張嘉惠蔡銘峰
口試日期:2015-07-30
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:86
中文關鍵詞:知識庫知識庫加速關連樣式資料探勘關連萃取
外文關鍵詞:knowledge baseknowledge base accelerationrelation extractiondata miningrelation pattern
相關次數:
  • 被引用被引用:1
  • 點閱點閱:186
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年隨著各式知識庫(Knowledge Base)的誕生,人類利用結構化知識進行各種不同應用的需求也快速的成長。但以往透過人工方式對知識庫進行更新的方式已經因為每單位時間內產生資料量的不斷的增加而漸漸無法負荷,因此如何自動從大量的資訊中萃取出有用的部分並進行知識庫加速(Knowledge Base Acceleration)成為一個重要的研究議題。
在各種幫助知識庫加速的相關資源中,關連樣式庫(Relation Patterns Taxonomy)儲存了大量有能力描述實體(Entity)與實體間關連(Relation)的關連樣式(Relation Pattern) 。可以幫助電腦系統了解在人類語言中是怎麼描述一種關連,進而達到知識庫加速的目標。雖然英文中已有相當成熟的關連樣式庫可供使用,但中文方面仍是非常的匱乏,使得許多優秀的研究無法在中文內進行。
本研究中提出一套建立中文關連樣式庫(Chinese Relation Patterns Taxonomy)的方法,並希望能針對中文提供這樣的研究資源,進而推進中文中知識庫加速研究的發展。本研究對如何在中文環境中蒐集語料庫(Corpus)、擷取實體特性範例(Instances)、萃取關連樣式以及定義關連樣式之信心值(Confidence)和實體類型(Entity Type)等的方法均有所論述。
做為成果的中文關連樣式庫以YAGO知識庫內的25種實體特性(Property)對關連樣式進行分類,每個關連樣式則有三種不同的形態分別有著不同的意涵。最後本研究也針對中文關連樣式庫進行效能上的評估以及錯誤分析來探討建立過程中發生的現象和可能的改進方向。透過中文關連樣式庫的建立,許多原本因為資源的缺乏而無法在中文中實現的研究也都得以進行,進而使的中文知識庫的完整性和可用性提高。

In recent years, structural knowledge base have attracted much attention in information retrieval and natural language processing. People rely on structural data to implement many applications that benefit the human life. With the growth of internet it is almost impossible for human editors to update all the world knowledge generated by people all around the world everyday to knowledge base. How to accelerate the construction of knowledge base becomes a critical issue known as knowledge base acceleration.
Relation extraction technique can be the key part to accelerate the knowledge base construction progress and to extract the relation between entities. There are useful resources such as relation patterns which can denote the binary relation between two entities. However, there are few relation patterns resources available in Chinese information extraction.
In this work, we present a Chinese relation patterns taxonomy for knowledge base acceleration. Each pattern in this taxonomy is semantically typed into YAGO properties and has its own confidence and entity types defined. We will describe the complete method to collect the Chinese corpora and to extract the Chinese relation patterns from them. Finally, we will examine the correctness of those patterns to evaluate the performance of the proposed pattern extraction method and analyze the errors occurred during the experiment. With the Chinese relation patterns taxonomy, many related works can be transferred from English to Chinese environments and further improve the usability and scale of Chinese knowledge bases.

口試委員會審定書…………………………………………………i
誌謝…………………………………………………………………ii
摘要…………………………………………………………………iii
Abstract……………………………………………………………iv
圖目錄………………………………………………………………vii
表目錄………………………………………………………………viii
第一章 緒論…………………………………………………………1
1.1研究背景……………………………………………………………1
1.2研究動機……………………………………………………………5
1.3研究目標……………………………………………………………8
1.4論文架構……………………………………………………………9
第二章 相關研究……………………………………………………10
2.1知識庫資源…………………………………………………………10
2.2知識庫加速…………………………………………………………11
2.3知識庫應用…………………………………………………………11
2.4關聯萃取……………………………………………………………13
2.5關聯樣式……………………………………………………………15
第三章 研究方法………………………………………………………17
3.1關連樣式分類………………………………………………………17
3.2實驗流程……………………………………………………………18
3.3知識庫使用…………………………………………………………20
3.4語料蒐集……………………………………………………………22
3.5資料處理……………………………………………………………26
3.6跨知識庫實體特性對應表…………………………………………30
3.7實體中文別名詞典…………………………………………………34
3.8實體特性範例擷取…………………………………………………36
3.9關連樣式萃取………………………………………………………42
第四章 實驗結果評估…………………………………………………53
4.1中文關連樣式庫……………………………………………………53
4.2關連樣式萃取效能…………………………………………………61
4.3事實萃取效能………………………………………………………75
第五章 結論與未來展望………………………………………………80
5.1結論…………………………………………………………………80
5.2未來展望……………………………………………………………81
參考文獻…………………………………………………………………82
附錄1……………………………………………………………………84
附錄2……………………………………………………………………85


Frank, J. R., Kleiman-Weiner, M., Roberts, D. A., Niu, F., Zhang, C., Ré, C., & Soboroff, I. (2012). Building an entity-centric stream filtering test collection for TREC 2012. MASSACHUSETTS INST OF TECH CAMBRIDGE.

Fabian, M. S., Gjergji, K., & Gerhard, W. (2007). YAGO: A core of semantic knowledge unifying wordnet and wikipedia. In 16th International World Wide Web Conference, WWW (pp. 697-706).

Mahdisoltani, F., Biega, J., & Suchanek, F. (2014). YAGO3: A knowledge base from multilingual Wikipedias. In 7th Biennial Conference on Innovative Data Systems Research. CIDR 2015.

Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., ... & Bizer, C. (2014). DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 5, 1-29.

Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008, June). Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1247-1250). ACM.

Kjersten, B., & McNamee, P. (2012). The HLTCOE approach to the TREC 2012 KBA track. JOHNS HOPKINS UNIV BALTIMORE MD HUMAN LANGUAGE TECHNOLOGY (HLT) CENTER OF EXCELLENCE.

Wang, J., Song, D., Lin, C. Y., & Liao, L. (2013, November). Bit and msra at trec kba ccr track 2013. In Notebook of the TExt Retrieval Conference.

Mihalcea, R., & Csomai, A. (2007, November). Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management (pp. 233-242). ACM.

Milne, D., & Witten, I. H. (2008, October). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 509-518). ACM.

Ferragina, P., & Scaiella, U. (2010, October). Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international conference on Information and knowledge management (pp. 1625-1628). ACM.

Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011, September). DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th International Conference on Semantic Systems (pp. 1-8). ACM.

Adolphs, P., Theobald, M., Schäfer, U., Uszkoreit, H., & Weikum, G. (2011, September). Yago-qa: Answering questions by structured knowledge queries. In Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on (pp. 158-161). IEEE.

Yao, X., & Van Durme, B. (2014). Information extraction over structured data: Question answering with freebase. In Proceedings of ACL.

Zeng, D., Liu, K., Lai, S., Zhou, G., & Zhao, J. (2014, August). Relation classification via convolutional deep neural network. In Proceedings of COLING (pp. 2335-2344).

Fader, A., Soderland, S., & Etzioni, O. (2011, July). Identifying relations for open information extraction. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 1535-1545). Association for Computational Linguistics.

Qiu, L., & Zhang, Y. (1870). Zore: A syntax-based system for chinese open relation extraction. In Proceedings of EMNLP (Vol. 1880).

Nakashole, N., Weikum, G., & Suchanek, F. (2012, July). PATTY: a taxonomy of relational patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 1135-1145). Association for Computational Linguistics.

Nakashole, N., Tylenda, T., & Weikum, G. (2013). Fine-grained Semantic Typing of Emerging Entities. In ACL (1) (pp. 1488-1497).

李卿澄(Qing-Cheng Li)。2014。內容串流中實體特性偵測之研究。碩士論文。台北:國立臺灣大學資訊工程學研究所。

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關點閱論文