跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.91) 您好!臺灣時間:2025/01/15 09:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:魏聖倫
研究生(外文):Sheng-Lun Wei
論文名稱:高覆蓋率中文關連樣式探勘以加速及完備知識圖譜之建立
論文名稱(外文):Chinese Relation Patterns Mining with High Coverage for Knowledge Base Acceleration and Completion
指導教授:陳信希陳信希引用關係
指導教授(外文):Hsin-Hsi Chen
口試委員:鄭卜壬蔡宗翰蔡銘峰
口試日期:2016-07-26
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊網路與多媒體研究所
學門:電算機學門
學類:網路學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:英文
論文頁數:42
中文關鍵詞:知識庫知識庫加速知識庫完備關聯萃取關連樣式資料探勘
外文關鍵詞:knowledge baseknowledge base accelerationknowledge base completionrelation extractionrelation patterndata mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:392
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年來,隨著網際網路的迅速發展,人們可以透過不同的管道取得大量的資訊,例如:網路新聞、社群網路、部落格、論壇等。人們每天在網路上製造大量的資訊,其中有些資訊經過蒐集、整理、歸納後是值得被人們儲存並再次利用的。知識庫即是常用來儲存這類有用資訊的方式之一,人們多半使用結構化的方式來儲存,以利往後能夠更加方便的使用這些知識。
然而,由於多數知識庫皆由人類編輯彙整,在這資訊爆炸的時代,資訊產生的量遠高於志願編輯者所能負擔,使得從事件發生到被新增到知識庫中會有一定程度的時間間隔。因此,如何有效的加速知識庫的建立將會是個重大的課題。關連樣式是人們常用來加速知識庫建立的方式,但除了英文之外,很少有其他語言的關連樣式資源可以讓人們使用。
本研究提出一套建立高覆蓋率中文關連樣式庫的方式,以加速知識庫的建立以及知識庫的應用。本研究以DBpedia的實體特性作為依據,針對每個實體特性進行探勘,找出其對應的中文關連樣式。我們將詳細的說明每個步驟的實作細節,包含文本的前處理、實體範例擷取、以及關連樣式萃取共三個部分。此外,我們也會討論過程中可能出現的問題,以及這些問題的影響與解決方式。最後,本研究使用人工標記者去衡量中文關連樣式的效能,並討論不同因素對於關連樣式品質的影響。
以往人們可以藉由應用英文關連樣式庫做相關的研究,其他語言因沒有較完整地關連樣式資源不得其門而入。如今,可藉由本研究產生之高覆蓋率中文關連樣式庫進行相同領域的研究,讓知識庫相關的研究能夠不只在英文領域發展,也同樣能在中文領域開啟一片天。此外,雖然本研究提出的方式主要是針對中文關連樣式的建立,但我們認為其他和中文有類似特性的語言,例如:日文、韓文,皆可嘗試使用本研究提出的方法來建立該語言專屬的關連樣式庫。


With the rapid development of the Internet in recent years, people can get infor-mation from it through different sources such as online news, social network, and fo-rums. A lot of information is created by people every day and some of them can be col-lected, comprehended, and turned into knowledge by human beings. Knowledge base is a way that people store those information with structural format. However, it’s hard to keep knowledge base up-to-date because of the wide gap between limited editors and numerous information of entities. Knowledge base acceleration is a critical issue which focus on accelerating the construction of knowledge base. In addition, relation patterns are useful for knowledge base acceleration. However, there are no resources available in languages beyond English.
In this study, we present a workflow for building relation pattern extraction system with high coverage for knowledge base acceleration and knowledge base completion. Our properties is based on the properties in DBpedia knowledge base. We will discuss many details of our method including corpus pre-processing, instance retrieval, and pat-tern extraction. Finally, we evaluate our relation patterns by human annotators and dis-cuss features that may affect the performance of the relation patterns.
With Chinese relation patterns, many related work can be utilized in Chinese by transferring from English environment to Chinese environment. Other languages may also use our method to build their own relation pattern resources.


致謝 i
摘要 ii
Abstract iii
Contents iv
List of Figures vi
List of Tables vii
1. Introduction 1
1.1 Background 1
1.2 Motivation 4
1.3 Goals 6
1.4 Structure 7
2. Related Work 8
2.1 Knowledge Bases 8
2.2 Knowledge Base Application 8
2.3 Knowledge Base Acceleration 10
2.4 Relation Extraction 11
2.5 Relation Patterns 12
3. Methods 14
3.1 Overview 14
3.2 Usage of Knowledge Base 15
3.3 Corpus Pre-processing 17
3.3.1 Corpus Selection 17
3.3.2 Corpus Cleaning and Segmentation 18
3.4 Entity Aliases Dictionary 19
3.5 Instance Retrieval 21
3.6 Pattern Extraction 26
4. Experiments and Analysis 31
4.1 Properties Selection 31
4.2 Annotator Judgment 33
4.3 Patterns Evaluations 34
5. Conclusion and Future Work 37
References 39

Adolphs, P., Theobald, M., Schäfer, U., Uszkoreit, H., & Weikum, G. (2011, Septem-ber). Yago-qa: Answering questions by structured knowledge queries. In Semantic Computing (ICSC), 2011 Fifth IEEE International Conference on (pp. 158-161). IEEE.
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., & Taylor, J. (2008, Jun.). Freebase: a collaboratively created graph database for structuring human knowledge. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data (pp. 1247-1250). ACM.
Fabian, M. S., Gjergji, K., & Gerhard, W. (2007). YAGO: A core of semantic knowledge unifying wordnet and wikipedia. In 16th International World Wide Web Conference, WWW (pp. 697-706).
Fader, A., Soderland, S., & Etzioni, O. (2011, July). Identifying relations for open in-formation extraction. In Proceedings of the Conference on Empirical Methods in Natu-ral Language Processing (pp. 1535-1545). Association for Computational Linguistics.
Ferragina, P., & Scaiella, U. (2010, Oct.). Tagme: on-the-fly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM international con-ference on Information and knowledge management (pp. 1625-1628). ACM.
Frank, J. R., Kleiman-Weiner, M., Roberts, D. A., Niu, F., Zhang, C., Ré, C., & Sobo-roff, I. (2012). Building an entity-centric stream filtering test collection for TREC 2012. MASSACHUSETTS INST OF TECH CAMBRIDGE.
Hoffart, J., Suchanek, F., Berberich, K., Kelham, E., de Melo, G., Weikum, G., Suchanek, F., Kasneci, G., Ramanath, M., & Pease, A. (2009). Yago2: A spatially and temporally enhanced knowledge base from wikipedia. Communications of the ACM 52 (4), (pp. 56-64).
Kjersten, B., & McNamee, P. (2012). The HLTCOE approach to the TREC 2012 KBA track. JOHNS HOPKINS UNIV BALTIMORE MD HUMAN LANGUAGE TECHNOLOGY (HLT) CENTER OF EXCELLENCE.
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P. N., ... & Bizer, C. (2014). DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semantic Web Journal, 5, (pp. 1-29).
Mahdisoltani, F., Biega, J., & Suchanek, F. (2014). YAGO3: A knowledge base from multilingual Wikipedias. In 7th Biennial Conference on Innovative Data Systems Re-search. CIDR 2015.
Mendes, P. N., Jakob, M., García-Silva, A., & Bizer, C. (2011, September). DBpedia spotlight: shedding light on the web of documents. In Proceedings of the 7th Interna-tional Conference on Semantic Systems (pp. 1-8). ACM.
Mihalcea, R., & Csomai, A. (2007, Nov.). Wikify!: linking documents to encyclopedic knowledge. In Proceedings of the sixteenth ACM conference on Conference on infor-mation and knowledge management (pp. 233-242). ACM.
Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM 38 (11), (pp. 39-41).
Milne, D., & Witten, I. H. (2008, Oct.). Learning to link with wikipedia. In Proceedings of the 17th ACM conference on Information and knowledge management (pp. 509-518). ACM.
Min, B., Grishman, R., Wan, L., Wang, C., & Gondek, D. (2013). Distant Supervision for Relation Extraction with an Incomplete Knowledge Base. In HLT-NAACL. (pp. 777-782).
Nakashole, N., Tylenda, T., & Weikum, G. (2013). Fine-grained Semantic Typing of Emerging Entities. In ACL (1) (pp. 1488-1497).
Nakashole, N., Weikum, G., & Suchanek, F. (2012, July). PATTY: a taxonomy of rela-tional patterns with semantic types. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Lan-guage Learning (pp. 1135-1145). Association for Computational Linguistics.
Qiu, L., & Zhang, Y. (1870). Zore: A syntax-based system for chinese open relation ex-traction. In Proceedings of EMNLP (Vol. 1880).
Vrandečić, D., Krötzsch, M. (2014, Oct.). Wikidata: a free collaborative knowledgebase. Communications of the ACM 57 (10), (pp. 78-85).
Wang, J., Song, D., Lin, C. Y., & Liao, L. (2013, Nov.). Bit and msra at trec kba ccr track 2013. In Notebook of the TExt Retrieval Conference.
Yao, X., & Van Durme, B. (2014). Information extraction over structured data: Ques-tion answering with freebase. In Proceedings of ACL.
Zeng, D., Liu, K., Lai, S., Zhou, G., & Zhao, J. (2014, August). Relation classification via convolutional deep neural network. In Proceedings of COLING (pp. 2335-2344).
李卿澄 (Qing-Cheng Li)。2014。內容串流中實體特性偵測之研究。碩士論文。台北:國立國立臺灣大學資訊工程學研究所。
邱彥賓 (Yen-Pin Chiu)。2015。為加速知識圖譜建立之中文關連樣式探勘。碩士論文。台北:國立國立臺灣大學資訊網路多媒體研究所。

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top