跳到主要內容

臺灣博碩士論文加值系統

(44.201.97.0) 您好!臺灣時間:2024/04/13 10:25
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林于荏
研究生(外文):Yu Jen Lin
論文名稱:多重概念與文章結構為基礎之文件推薦機制
論文名稱(外文):The Study of Generating Recommended Documents Based on Multiple Concepts and Document Structure
指導教授:翁頌舜翁頌舜引用關係
指導教授(外文):Sung Shun Weng
學位類別:碩士
校院名稱:輔仁大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2002
畢業學年度:90
語文別:英文
論文頁數:61
中文關鍵詞:文字探勘多重概念文件結構相似文件尋找
外文關鍵詞:text miningmultiple conceptsdocument structuresimilarity searching
相關次數:
  • 被引用被引用:3
  • 點閱點閱:128
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
在現實世界中,大量資訊以文字與非結構的形式出現,對許多人而言,其中隱藏了相當多珍貴重要的資訊。因此如何將這些珍貴重要的資訊挖掘出來成為一門重要的課題。原來應用在資料探勘上的技術,也被轉移到使用在文字探勘上。其中,相似文件的尋找便是文字探勘研究中一門重要的課題。
在文件管理的領域中,相似文件尋找是不可或缺的作業。過去大多數相似文件找尋的工作集中在以文件的內容為基礎來比較不同的分類演算法或者改善演算法的績效。我們提出結合不同演算法以解決相似文件尋找問題的多重概念機制。此外,我們也同時考慮過去研究常忽略的另一個因素-結構的分配。根據我們實驗的結果,所提出的多重概念技術較傳統的方法尤佳,另外結構的分配的方面,由於實驗資料並未考慮到結構分配的因素,其有效性雖然無法透過實驗直接證實其有效性,但在瞭解文件內容上確實有相當的幫助。
In reality, a large portion of the available information appearing in textual and unstructured forms is valuable to people. Techniques specifically for analyzing textual data become necessary to extract information from such kind of textual datasets. The searching of similar documents also plays an important role in every aspect of text mining research.
Similarity searching is an essential task for document management. Most work of the past researches focused on comparing different algorithms of classification by considering the contents of the documents or improving the performance of the algorithms. We propose a multiple-concept mechanism composed of different algorithms to solve the similarity searching problem. Furthermore, another factor-“distribution of structure” ignored by previous researches is also considered in this study. According to the empirical evaluation result, the proposed technique was more effective than the traditional approaches. Namely the effectiveness of “distribution of structure” had been proved.
CONTENTS
LIST OF FIGURES iii
LIST OF TABLES iv
CHAPTER 1 Introduction 1
1.1 Background 1
1.2 Research Motivation 3
1.3 Research Objective 3
1.4 Organization of the Thesis 4
CHAPTER 2 Literature Review 5
2.1 Textual Data Mining Architecture 6
2.2 Dimensionality Reduction 7
2.3 Vector Space Model 11
2.4 Categorization 11
2.5 Structure factor 13
CHAPTER 3 Generating Recommended Document Based on Multiple Concepts and Document Structure 15
3.1 Research Architecture 15
3.2 Assumption and the Research Environment 16
3.3 Problem Description 16
3.4 Background Theories 17
3.5 System Prototype 21
CHAPTER 4 Evaluation of Information Retrieval Based on Multiple Concepts and Document Structure 39
4.1 Experimental Design 39
4.2 Parameter Tuning 39
4.3 Test Data Set 40
4.4 Evaluation Measure 42
4.5 Effectiveness of Multiple Concepts 45
4.6 Effectiveness of Structure Variance 50
CHAPTER 5 Conclusion 57
5.1 Conclusion 57
5.2 Future Work 58
References: 59
References:
1.Aas, K. and Eikvil, L. "Text Categorisation: A Survey." 1999.
2.Apte, C., Damerau, F. and Weiss, S. M. "Automated learning of decision rules for text categorization." ACM Transactions on Information Systems, Vol. 12, No. 3, July 1994.
3.Arimura, H., Abe, J., Sakamoto, H., Arikawa, S., Fujino, R. and Shimozono, S. "Text Data Mining: Discovery of Important Keywords in the Cyberspace." Kyoto International Conference on Digital Libraries, 2000, pp. 121-126.
4.Arimura, H., Arikawa, S. and Shimozono, S. "Efficient Discovey of Optimal Word-Association Patterns in Large Text Databases." New Generation Computing, Vol. 18, 2000, pp. 49-60.
5.Boone, G. "Concept Features in Re:Agent, an Intelligent Email Agent." Autonomous Agents Minneapolis MN USA, 1998.
6.Buckley, C., Salton, G. and Allan, J. "The effect of adding relevance information in a relevance feedback environment." VII International ACM-SIGIR Conference on Research and Development in Information Retrieval. London: Springer-Verlag, 1994.
7.Chakrabarti, S. "Data mining for hypertext: A tutorial survey." ACM SIGKDD Exploration, Vol. 1, No. 2, January 2000.
8.Chen, H., Hsu, P., Orwig, R., Hoopes, L. and Nunamaker, J. F. "Automatic Concept Classification of Text from Electronic Meetings." Communications of the ACM, Vol. 37, No. 10, October 1994.
9.Chen, H., Ng, T. D., Martinez, J. and Schatz, B. R. "A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system." Journal of the American Society for Information Science, Vol. 47, No. 8, August 1996.
10.Chung, Y., Pottenger, W. M. and Schatz, B. R. "Automatic Subject Indexing Using an Associative Neural Network." Digital Libraries Pittsburgh PA USA, 1998.
11.Clarke, C. L. A., Cormack, G. V., Kisman, D. I. E. and Lynam, T. R. "Question Answering by Passage SelectioNo." Text Retrival Conference 9, 2001.
12.Cohen, W. W. "Learning Rules that Classify E-mail", Proceeding of the AAAI Spring Symposium on Machine Learning in Information Access, 1996.
13.Cohen, W. W. and Singer, Y. "Context-Sensitive Learning Methods for Text Categoriztion." International ACM-SIGIR Conference on Research and Development in Information Retrieval SIGIR-96, 1996.
14.Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K. and Harshman, R. "Indexing by Latent Semantic Analysis." Journal of the Society of Information Science, Vol. 41, No. 6, 1990.
15.Dumais, S. T. "Using LSI for information filtering: TREC-3 erperiments." Proceeding of the Third Text Retrieval Conference (TREC-3). National Institute of Standards and Technology, 1995.
16.Elworthy, D. "Question Answering using a large NLP system." Text Retrival Conference 9, 2001.
17.Feldman, R. and Dagan, I. "Knowledge discovery in textual databases (KDT)." 1st International Conference on Knowledge Discovery(KDD-95). Montreal, August 1995.
18.Feldman, R. and Hirsh H. "Mining Text Using Keyword Distributions." Journal of Intelligent Information Systems. 1998, pp281-300.
19.Han, E., Karypis, G. "Concept Indexing - A Fast Dimensionality Reducing Algorithm with Applications to Document Retreival & Categorization." Technical report, University of Minnesota, Dept. of Computer Science / Army HPC Research Center, March 2000.
20.Hu, S. J., Hsu, C.C. , "Chinese News Document Analyzing and Mining", International Conference of Information Management Research and Practice, Taipei, 1999.
21.Jackson, J. E. "A user''s Guide To Principal Components." John Wiley&Sons, 1991.
22.Kanal, L. N. and Kumar, V. "Search in Artificial Intelligence." Springer- Verlag, New York, NY, 1998.
23.Keen, E. M. "Term Position Ranking: Some new test results." Proceedings of the Fifteenth Annual International ACM SIGIR conference on Research and development in information retrieval, June 1992.
24.Kim, S., Baek, D., Kim, S. and Rim, H. "Question Answering Considering Semantic Categories and Co-occurrence Density." Text Retrival Conference 9, 2001.
25.Kohavi, R. and Sommerfield, D. "Feature subset selection using the wrapper method: Overfitting and dynamic search space topology." Proceeding of the First Int''l Conference on Knowledge Discovery and Data Mining, Montreal, Quebec, 1995, pp. 192-197.
26.Kowalski, G. "Information Retrieval Systems: Theory and Implementation." Kluwer Academic Publishers. Norwell, MA, 1997.
27.Kwon, O. and Lee, J. "Web Page Classification Based on k-Nearest Neighbor Approach." Proceedings of 5th International Workshop Information Retrieval with Asian Languages, 2000.
28.Larsen, B. and Aone, C. "Fast and effective text mining using lineartime document clustering." Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, August 1999, pp. 16-22.
29.Lin, C. H. and Chen H. "An automatic indexing and neural network approach to concept retrieval and classification of multilingual (Chinese-English) documents. IEEE Transactions on Systems, Main and Cybernetics, v. 26, n.1, February 1996.
30.Loh, S., Wives, L. K. and M.de, J. P. "Concept Based Knowledge Discovery in Texts Extracted from the Web." ACM SIGKDD Explorations , Vol. 2, Issue 1, 2000, pp. 29-39.
31.Losiewicz, P., Oard, D. W., Kostoff, R. N. "Textual Data Mining to Support Science and Technology Management." Journal of Intelligent Information Systems, 2000, pp99-119.
32.Megaputer, "Automated Analysis of Natural Language Texts." http://www.megaputer.com/tech/wp/tm.php3, 1997
33.Morton, T. S. "Using Coreference in Question Answering." Text Retrival Conference 9, 2001.
34.Murata, M., Ma, Q., Uchimoto, K., Ozaku, H., Utiyama, M. and Isahara, H. "Japanese Pobabilistic Information Retrieval Using Location and Category Information." Proceedings of the fifth international workshop on Information retrieval with Asian languages, 2000.
35.Payne, T. R. and Edwards, P. "Learn Agents that Learn: An investigation of learning issues in a mail agent interface." Applied Artificial Intelligence, Vol. 11, 1997, pp. 1-32.
36.Rijsbergen, C. J. "A Theoretical Basis for the Use of Co-occurrence Data in Information Retrieval." Journal of Documentation, Vol. 33, No. 2, June 1977, pp. 106-119.
37.Salton, G. "Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer." Addison-Wesley, 1989.
38.Salton, G. and McGill, M. J. "Introduction to Modern Information Retrieval." New York: McGraw-Hill, 1983.
39.Shapiro, S. C. "Encyclopedia of Artificial Intelligence." John Wiley&Sons, Inc., 1990, pp. 994-998.
40.Sowa, J. F. "Knowledge representation: logical, philosophical, and computational foundations." Brooks/Cole Publishing Co., Pacific Grove, CA, 2000.
41.Tkach, D. "Text Mining Technology: Turning Information Into Knowledge: A White Paper from IBM." IBM Corporation, 1997.
42.Wettschereck, D., Aha, D. W. and Mohri, T. "A Review and Empirical Evaluation of Feature-Weighting Methods for a Class of Lazy Learning Algorithms." AI Review, Vol. 11, 1997.
43.Yang, Y. "An evaluation of statistical approaches to text categorization." Journal of Information Retrieval, 1(1/2), 1999, pp. 67-88.
44.Yang, Y. and Liu, X. "A re-examination of text categorization methods." Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR''99), 1999, pp. 42-49.
45.Yang, Y. and Pedersen J. P. "A Comparative Study on Feature Selection in Text Categorization." Proceedings of the Fourteenth International Conference on Machine Learning (ICML''97), 1997.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top