跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.81) 您好!臺灣時間:2024/12/02 22:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:楊捷扉
研究生(外文):Jie-Fei Yang
論文名稱:人物搜尋之資訊擷取與分類
論文名稱(外文):Information Extraction and Classification for Person Search
指導教授:張俊盛張俊盛引用關係
指導教授(外文):Jason S. Chang
學位類別:碩士
校院名稱:國立清華大學
系所名稱:資訊系統與應用研究所
學門:電算機學門
學類:系統設計學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:英文
論文頁數:50
中文關鍵詞:人名檢索資訊擷取文件分類
外文關鍵詞:person searchinformation extractiontext categorization
相關次數:
  • 被引用被引用:4
  • 點閱點閱:374
  • 評分評分:
  • 下載下載:42
  • 收藏至我的研究室書目清單書目收藏:1
本論文提出一個以網路資源為本,自動收集中文人名經歷資訊及專業領域。透過個人經歷資訊擷取以及專業領域的分類,可以有效地解決人名歧異(Personal Name Disambiguation)之問題。而專業領域分類更使得個人資訊的提供,能有系統一致化地呈現給使用者。
在訓練過程中,我們利用語言學的知識以及統計學上的技術,從網路上收集經歷資訊之表面樣式(surface patterns),作為從網路上收集人名資訊以及擷取個人資訊之依據。並且應用Yarowsky (1995)的自舉式方法,以網路資源為本來訓練文件分類器。在執行階段,輸入的人名透過表面樣式之輔助收集經歷資訊,經由經歷資訊及領域分類,解析區隔同名同姓人士的資訊。
我們也將描述此一方法的系統實作。實驗結果證明我們的方法能夠有效地取出人名的經歷,並且區格不同領域的同名同姓人士,使得個人資訊之網路搜集更為有效。
We introduce a method for automatically collecting personal information and professional domain of the person. In our approach, personal information is extracted and the domain is identified from web-based data based on personal name disambiguation.
In the training phase, the method involves generating surface pattern to personal information extraction based on linguistic and statistical information from the Web, and an unsupervising algorithm for constructing Web-based text categorization. At runtime, submitting a person name into a search engine, extracting personal information and identifying each retrieved passage the domain according to the expected person name, finally the referents are sorted by domain, personal information and the degree of popularity.
We also described an implementation of the proposed method. Blind evaluation of a set of names shows that our method outperforms extracting personal information and cleanly classifying individual’s domain-specific knowledge. This method can be applied to help users quickly find about a person with resulting in the display of personal information in a systematic and consistent way.
摘要 i
ABSTRACT ii
Acknoledgement iii
Table of Contents iv
List of Tables v
List of Figures vi
Chapter 1 Introduction 1
Chapter 2 Related Work 6
Chapter 3 The PeopleSea System 11
3.1 Problem Statement 11
3.2 People Search on the Web 13
3.2.1 Full Title Extraction 13
3.2.2 Domain Classifier 18
3.2.3 Runtime Process 21
Chapter 4 Experiments and Results 28
4.1 Experimental Setting 28
4.2 Evaluation Metrics 31
4.3 Experimental Results 32
Chapter 5 Discussion 35
5.1 Evaluation on Full Title Extraction 35
5.2 Evaluation on Domain Classification 37
5.3 Limitation in Our Current Research 37
Chapter 6 Conclusion and Future Work 39
References 41
Appendix A – Domain Decision List 43
Appendix B – Full Titles Extraction 46
AI-Kamha, R. and Embley, D. W. Grouping Search-Engine Returned Citations for Person-Name Queries. In WIDM’04, pp.96-103, Washington, DC, USA, 2004.
Bagga, A. and Baldwin, B. Entity-Based Cross-Document Coreferencing Using the Vector Space Model. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, pp. 79-85, Montreal, Canada, 1998.
Bekkerman, R. and McCallum A. Disambiguating Web Appearances of People in a Social Network. In Proceedings of the 15th World Wide Web Conference (WWW 2005), ACM press, pp.463-470, Chiba, Japan, 2005.
Bollegala D., Marsuo Y., and Ishizuka M. Extracting Key Phrases to Disambiguate Personal Names on the Web. In Proceeding of CICLing, 2006.
Fleischman M. B. and Hovy E. Multi-document Person Name Resolution. In Proceedings of the Workshop on Reference Resolution, Barcelona, Spain, 2004.
Googlism. 2003. <http://www.googlism.com> (1 July 2006).
Guha, R. and Garg, A. Disambiguating People in Search. In Proceedings of the 13th World Wide Web Conference (WWW 2004), ACM Press, 2004.
Lloyd, L., Bhagwan V., Gruhl D., and Tomkins A. Disambiguation of references to individuals. Technical Report RJ10364 (A0510-011), IBM Research, 2005
Malin, B. Unsupervised Name Disambiguation via Social Network Similarity. In proceedings of the Workshop on Link analysis, Counterterrorism, and Security, in conjunction with the SIAM International Conference on Data Mining, pp. 93-102, Newport Beach, CA, 2005.
Mann, G. S. and Yarowsky, D. Unsupervised Personal Name Disambiguation. In Proceedings of 7th Conference on Computational Natural Language Learning (CoNLL-2003), pp. 33-40, Edmonton, Canada, 2003.
Manning, C. D. Foundations of Statistical Natural Language Processing (London: England, 1999), pp. 232, 249-252, 494, 575.
Peng, F., Weischedel, R., Licuanan, A., Xu, J. Combining Deep Linguistics Analysis and Surface Pattern Learning: A Hybrid Approach to Chinese Definitional Question Answering, 2005. Retrieved June 2, 2006, from http://www.cs.umass.edu/fuchun/publication/HLT-EMNLP2005.pdf.
Soubbotin, M. M. Patterns of Potential Answer Expressions as Clues to the Right Answer. In Proceedings of the TREC-10 Conference, NIST, pp.175-182, Gaithersburg, MD, 2001.
Vivisimo Inc.2000. <http:// www.vivisimo.com> (1 July 2006).
Voorhees, E. M. Overview of the TREC 2003 Question answering Track. In proceeding of the 12th Text Retrieval Conference (TREC 2003), pp. 54-68, Gaithersburg, MD, 2004.
Wan, X., Gao, J., Li, M., and Ding, B. Person Resolution in Person Search Results: WebHawk. In Proceedings of ACM 14th Conference on Information and Knowledge Management (CIKM 2005), pp. 163-170, Bremen, Germany, 2005.
Yarowsky, D. Decision Lists for Lexical Ambiguity Resolution: Application to Accent Restoration in Spanish and French. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp.88-95, Las Cruces, NM, 1994.
Yarowsky, D. Unsupervised Word Sense Disambiguation Rivaling Supervised Methods. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189-196, 1995.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top