跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.89) 您好!臺灣時間:2025/01/25 03:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張立典
研究生(外文):Li-Tien Chang
論文名稱:以知識表徵為基之文件分群法
論文名稱(外文):An Ontology-based Document Clustering Methodology
指導教授:張瑞芬張瑞芬引用關係
指導教授(外文):Amy J.C. Trappey
學位類別:碩士
校院名稱:國立清華大學
系所名稱:工業工程與工程管理學系
學門:工程學門
學類:工業工程學類
論文種類:學術論文
論文出版年:2005
畢業學年度:93
語文別:英文
論文頁數:79
中文關鍵詞:知識表徵文件分群模糊推論
外文關鍵詞:OntologyFuzzy inference controlDocument clustering
相關次數:
  • 被引用被引用:2
  • 點閱點閱:369
  • 評分評分:
  • 下載下載:47
  • 收藏至我的研究室書目清單書目收藏:3
此論文主要是提出一個分析以及分群知識文件的方法論,現今有很多分析知識文件的方法,都使以關鍵字為基所發展出來,但是關鍵字不管對人或電腦來講,都是片斷的、比較沒意義的。因此再此我們提出一個以知識表徵為基的知識文件分析方法,藉由知識表徵,希望讓電腦能夠在某一程度下更能夠真正了解知識文件的內容。此方法主要分為幾大步驟,首先專家必須先建立某一領域的知識,並且輸入訓練資料以訓練系統字彙。在訓練完成後,便可作知識文件的分群。分群的步驟,首先之事文建會經過自然語言處理,然後再經由事先所訓練的字彙,找出代表知識文件的知識表徵,接著我們藉此知識表徵且利用模糊推論去推論知識文件間的關係值,最後再利用階層式的分群法對知識文件做分群動作。在此研究最後,我們會評估本方法的效果,並且與關鍵字為基的方法做比較與討論。
A purpose of the thesis is to present a novel method in analyzing, synthesizing and managing knowledge documents. In general, the methodologies that synthesize and management patents are almost using the key phrases as indices of knowledge documents. But the key phrases extracted from patents are meaningless to computers. Thus a novel methodology to analyze and manage knowledge documents based on ontology is developed in this thesis research. The methodology in this thesis enables computers to understand the knowledge documents in some degree via ontology instead of key phrases in this thesis. The methodology is divided into several steps. First, experts have to construct the specific domain ontology schema and put some training data to train the system. Then a learning method from natural language texts is adapted to infer the principal ontology of the knowledge documents. Therefore we use the fuzzy logic control (FLC) to infer the relationship between the knowledge documents and a suitable document cluster via ontology. Finally, we will evaluate the effectiveness of this methodology, and compare with knowledge document clustering based on key phrases.
Table of Content
中文摘要 I
Abstract II
List of Figures V
List of Tables VI
1. Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Thesis objective 4
2. Literature Review 5
2.1 Text mining 5
2.2 Ontology 7
2.3 Fuzzy logic control 10
2.4 Clustering methodology 11
3. Methodology Architecture and Functional Detail 14
3.1 Experts construct domain ontology schema 16
3.2 Terminology training 16
3.3 Natural language processing 18
3.4 Terminological analyzer 19
3.5 Knowledge extraction 20
3.6 Relationship generator of knowledge documents 22
3.7 Clustering of knowledge documents 31
4. Results and Evaluations of the Experiment 32
4.1 Data collection 32
4.2 Ontology construction 33
4.3 Result of terminology training 36
4.4 Results of knowledge extraction and clustering 42
4.5 Evaluation 47
5. Conclusions 52
Appendix 1. Schema of Business News 58
Appendix 2. Ontology of Business News in Protégé 59
Appendix 3. Schema of CMP Patent 60
Appendix 4. Ontology of CMP Patent in Protégé 64
Appendix 5. Clustering Result for Documents of Business News 65
Appendix 6. Clustering Result for Documents of CMP Patent 73

List of Figures
Figure 1. Two approaches (keyword-based and ontology-based) in knowledge document management on a computer platform 3
Figure 2. A graphic example of a statement in ontology 8
Figure 3. Architecture of fuzzy logic control 10
Figure 4. Tree structure of hierarchical clustering algorithm 12
Figure 5. Structure of neural network 13
Figure 6. Functional flow for the operation of FODM methodology 15
Figure 7. User interface of protege 16
Figure 8. An example of training data 18
Figure 9. Tagging process of a sentence 19
Figure 10. A chunked example 21
Figure 11. Process of filtering statement in ontology 22
Figure 12. Ontological comparison of two documents 24
Figure 13. Membership function of concept “many” 25
Figure 14. Membership function of concept “mediate” 26
Figure 15. Membership function of concept “few” 27
Figure 16. Inference process 28
Figure 17. Rules and concepts of fuzzy inference model 28
Figure 18. Membership function of concept “high” 29
Figure 19. Membership function of concept “mediate” 30
Figure 20. Membership function of concept “low” 31
Figure 21. Ontology of patent infringement, trade and application 34
Figure 22. Ontology of CMP 36
Figure 23. An ontological translation in business news 43
Figure 24. An ontological translation in CMP patent 44
Figure 25. Comparison of the results between two clustering methodologies 51


List of Tables
Table 1. RDF concepts 9
Table 2. Meaning of the tags 17
Table 3. Probability of lemma to concept 18
Table 4. An example of terminological analyzing 20
Table 5. Rules of fuzzy logic control for patent document analysis 23
Table 6. Profile of knowledge documents 33
Table 7. Terminology of business news (1) 37
Table 8. Terminology of business news (2) 38
Table 9. Terminology of CMP patent (1) 39
Table 10. Terminology of CMP patent (2) 40
Table 11. Terminology of CMP patent (3) 41
Table 12. Clustering result of business news 45
Table 13. Clustering result of CMP patent 46
Table 14. K-mean clustering result of CMP patent based on key phrases 47
Table 15. Relevant and retrieved sets 47
Table 16. Evaluation of knowledge extraction 49
Table 17. Precision and recall comparison between this research and TF*IDF 49
Table 18. Comparison between fuzzy logic control clustering based on ontology and K-mean clustering based on key phrases 51
References
[1] Aizawa, A., 2003, “An information-theoretic perspective of tf–idf measures,” Information Processing and Management, Vol. 39, pp. 45-65.
[2] Champin, P-A., “RDF Tutorial”, 2001.
[3] Feng, F., and Bruce Croft W., “Probabilistic techniques for phrase extraction,” Information Processing and Management, 37, 2001, 199-220.
[4] Hou, J.L., Chan, C.A., “A document content extraction model using keyword correlation analysis,” International Journal of Electronic Business Management (Taiwan), Vol. 1, No. 1, 2003, 54-62.
[5] http://www.ontology.org/
[6] Kao, C-C. (Advisor: Prof. Y-H. Kuo, and J-H. Chiang), “Personalized information classification system with automatic ontology construction capability,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
[7] Kung, C-C. (Advisor: Prof. Y. H. Kuo), “Personalized XMLInformation service system with automatic object-oriented ontology construction,” M.S. Thesis, Department of Computer Science & Information Engineering, 2000, National Cheng Kung University, Tainan, Taiwan.
[8] Lam, S-L., and Lee, L-D., “Feature reduction for neural network based text categorization,” Proceedings of the 6th International Conference on Database Systems for Advanced Applications, 1999, ,195-202.
[9] Lee, C-S., Chen, Y-J., and Jian, Z-W., “Ontology-based fuzzy event extraction agent for Chinese e-news summarization,” Expert Systems with Applications, 25, 2003, 431-447.
[10] Liebowitz, J., “Knowledge management and its link to artificial intelligence,” Expert Systems with Applications, 20, 2001, 1-6.
[11] Lin, S. C. I. (Advisor: Prof. A. J. C. Trappey), “Using Neural Network Categorization Technology to Develop an Electronic Document Management System,” M.S. Thesis, Department of Industrial Engineering and Engineering Management, 2004, National Tsing Hua University, Hsinchu, Taiwan.
[12] Macintosh, A., Filby, I., and Kingston, J., “Knowledge management techniques: teaching and dissemination concepts,” Int. J. Human-Computer Studies, 1999, 549-566.
[13] Maiers, J., and Sherif, Y.S., ”Applications of fuzzy set theory,” IEEE Transactions Systems, SMC-15, 1985, 175-189.
[14] Malone, D., “Knowledge management: a model for organizational learning,” International Journal of Accounting Information Systems, 3, 2002, 111-123.
[15] Mamdani, E.H., “Application of fuzzy logic to approximate reasoning using linguistic synthesis,” IEEE Transactions on Computers, C-26, 1997, 1182-1191.
[16] Nevill-Manning, C. G.., Witten I. H., and Paynter G. W., “Lexically-generated subject hierarchies for browsing large collections,” Intranet. J. Digital Libraries, 2(2-3), 1999, 111-123.
[17] Nissen, M. E., “Knowledge-based knowledge management in the reengineering domain,” Decision Support Systems, 27, 1999, 47-65.
[18] Perrin, P., and Petry, F. E., “Extraction and representation of contextual information for knowledge discovery in texts,” Information Sciences, 151, 2003, 125-152.
[19] Rindflesch, T-C., and Fiszman, M., “The interaction of domain knowledge and linquistic structure in natural language processing: interpreting hypernymic propositions in biomedical text,” Journal of Biomedical Informatics, 2003, 36, 462-477.
[20] Runkler, T. A., and Bezdek, J. C., “Web mining with relational clustering,” International Journal of Approximate Reasoning, 32, 2003, 217-236.
[21] Russell, S., and Norvig, P., “Artificial intelligence a modern approach,” 2002, Addison-Wesley, New York.
[22] Sanchez, J-M., Garcia, and R., Bries, J-T., “An approach for incremental knowledge acquisition from text,” Expert System with Application, 25, 2003, 77-86.
[23] Sanchez, S. N., Triantaphyllou, E., and Kraft, D., “A feature mining based approach for the classification of text documents into disjoint classes,” Information Processing and Management, 38, 2002, 283-604.
[24] Shamsfard, M., and Barforoush, A.A., “Learning ontologies from natural language texts,” Human-Computer Studies, 60, 2004, 17-63.
[25] Takaki, T., and Sugeno, M., “Fuzzy identification of systems and its applications to modeling and control,” IEEE Transactions on Systems, SMC-15, 1985, 116-132.
[26] Vlajic, N., Card, H.C., “An adaptive neural network approach to hypertext clustering,” Neural Networks. IJCNN '99. International Joint Conference on, vol.6, 1999, 3722 - 3726
[27] Wang, H.F., and Wu, G.Y., “Multicriteria Fuzzy C-Mean Analysis,” Fuzzy Set & System, 64, 1994, 311-319.
[28] Witten, I. H., “Adaptive text mining: inferring structure from sequences,” Journal of Discrete Algorithms, 2, 2004, 137-159.
[29] Wu, Z., and Palmer, M., “Verb semantics and lexical selection,” Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994, 133-138.
[30] Yuan, S-T., and Cheng, C., “Ontology-based personalized couple clustering for heterogeneous product recommendation in mobile marketing,” Expert System with Applications, 26, 2004, 461-476.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top