跳到主要內容

臺灣博碩士論文加值系統

(3.237.38.244) 您好!臺灣時間:2021/07/24 15:49
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林彥廷
研究生(外文):Yen-Ting Lin
論文名稱:跨語言文件自動分類之研究
論文名稱(外文):Cross-Lingual Text Categorization
指導教授:魏志平魏志平引用關係
指導教授(外文):Chih-Ping Wei
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:51
中文關鍵詞:文字探勘文件分類文件管理跨語言文件分類
外文關鍵詞:Document managementCross-lingual text categorizationText categorizationText mining
相關次數:
  • 被引用被引用:0
  • 點閱點閱:97
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:7
隨著網際網路服務與電子商務應用的快速發展與普及,產生了大量且能夠在網際網路上取得的資訊,而這些資訊通常為文字格式的文件。為了協助後續的存取和增加這些文件的效用,發展有效率與有效的技術來管理這些持續增加的文字文件,成為組織與個人的一項重要工作。在文件管理方面,傳統上人們習慣用類別的概念來整理其檔案或文件;然而,現存的文件分類技術主要著重在處理單語言文件。由於商業環境的全球化和網際網路技術的長足進步,組織與個人通常需要檢索與整理不同語言的文件,使得跨語言文件分類的需求與日俱增。基於上述跨語言文件分類技術的重要性與需求,本研究旨在設計兩種不同的文件類別指派方法,分別是individual-based方法及cluster-based方法。實驗結果顯示,本論文所提出的跨語言文件分類技術有優異的表現,同時,以cluster-based方法進行的跨語言文件分類結果優於individual-based的跨語言文件分類結果。
With the emergence and proliferation of Internet services and e-commerce applications, a tremendous amount of information is accessible online, typically as textual documents. To facilitate subsequent access to and leverage from this information, the efficient and effective management—specifically, text categorization—of the ever-increasing volume of textual documents is essential to organizations and person. Existing text categorization techniques focus mainly on categorizing monolingual documents. However, with the globalization of business environments and advances in Internet technology, an organization or person often retrieves and archives documents in different languages, thus creating the need for cross-lingual text categorization. Motivated by the significance of and need for such a cross-lingual text categorization technique, this thesis designs a technique with two different category assignment methods, namely, individual- and cluster-based. The empirical evaluation results show that the cross-lingual text categorization technique performs well and the cluster-based method outperforms the individual-based method.
CHAPTER 1. INTRODUCTION 1
1.1 BACKGROUND 1
1.2 RESEARCH MOTIVATION AND OBJECTIVES 2
1.3 ORGANIZATION OF THE THESIS 2
CHAPTER 2. LITERATURE REVIEW 4
2.1 CROSS-LINGUAL INFORMATION RETRIEVAL 4
2.2 THESAURUS CONSTRUCTION TECHNIQUES 6
2.3 TEXT CATEGORIZATION TECHNIQUES 9
2.4 EXISTING TECHNIQUES FOR CROSS-LINGUAL TEXT CATEGORIZATION 13
CHAPTER 3. DESIGN OF A CROSS-LINGUAL TEXT CATEGORIZATION TECHNIQUE 14
3.1 CROSS-LINGUAL THESAURUS CONSTRUCTION PHASE 15
3.2 TEXT CATEGORIZATION LEARNING PHASE 18
3.3 CATEGORY ASSIGNMENT PHASE OF TEXT CATEGORIZATION 20
3.3.1 Individual-based category assignment method 20
3.3.2 Cluster-based category assignment method 24
CHAPTER 4. EMPIRICAL EVALUATION OF CROSS-LINGUAL TEXT CATEGORIZATION 26
4.1 EVALUATION DESIGN 26
4.1.1 Data collection 26
4.1.2 Evaluation criteria 27
4.1.3 Evaluation procedure 28
4.2 EVALUATION RESULTS 28
4.2.1 Effects of monolingual text categorization 28
4.2.2 Parameter tuning experiments for cross-lingual text categorization 30
4.2.2.1 Parameter tuning for individual-based category assignment method 31
4.2.2.2 Parameter tuning for cluster-based category assignment method 33
4.2.3 Comparative evaluations 37
4.2.3.1 Classifying Chinese documents into English categorization 37
4.2.3.2 Classifying English documents into Chinese categorization 38
CHAPTER 5. CONCLUSION AND FUTURE RESEARCH DIRECTIONS 39
REFERENCES 41
[ABS00]Agrawal, R., Bayardo, R., and Srikant, R., “Athena: Mining-Based Interactive Management of Text Databases,” Proceedings of the 7th International Conference on Extending Databases Technology (EDBT00), 2000, pp. 365-379.
[ADW94]Apte, C., Damerau, F., and Weiss, S., “Automated Learning of Decision Rules for Text Categorization,” ACM Transactions of Information Systems, Vol. 12, No. 3, 1994, pp. 233-251.
[AF77]Attar, R. and Fraenkel, A. S., “Local Feedback in Full-Text Retrieval Systems,” Journal of the ACM, Vol. 24, No. 3, 1997, pp. 397-417.
[AIS93]Agrawal, R., Imielinski, T., and Swami, A., “Mining Association Rules Between Sets of Items in Large Databases,” Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 1993, pp. 207-216.
[AS94]Agrawal, R. and Srikant, R., “Fast Algorithms for Mining Association Rules,” Proceedings of the 20th VLDB Conference, Santiage, Chile, September 1994, pp. 487-499.
[B92]Brill, E., “A Simple Rule-Based Part of Speech Tagger,” Proceedings of the Third Conference on Applied Natural Language Processing, ACL, Trento, Italy, 1992.
[B94]Brill, E., “Some Advances in Rule-Based Part of Speech Tagging,” Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994, pp. 722-727.
[BKV03]Nuria, B., Cornelis, H.K., and Marta, V., “Cross-Lingual Text Categorization,” Proceedings ECDL’03, August 2003, pp. 126-139.
[BM98]Baker, L. D. and Mccallum, A. K., “Distributional Clustering of Words for Text Classification,” Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘98), 1998, pp. 96-103.
[CS96]Cohen, W. W. and Singer, Y., “Context-Sensitive Learning Methods for Text Categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Zurich, Switzerland, August 1996, pp. 307-315.
[DD95]Davis, M. and Munning, T., “Query Translation Using Evolutionary Programming for Multi-Lingual Information Retrieval,” Proceedings of the Fourth Annual Conference on Evolutionary Programming, San Diego, CA., March, 1995.
[DLL96]Dumais, S. T., Landauer, T. K., and Littman, M. L., “Automatic Cross-Linguistic Information Retrieval Using Latent Semantic Indexing.,” Proceedings of ACM SIGIR Workshop on Cross-Linguistic Information Retrieval, 1996, pp.16-23.
[DLL97]Dumais, S.T., Letsche, T.A., Littman, M.L, and Landauer, T.K, “Automatic Cross-Language Retrieval Using Latent Semantic Indexing,” Proceedings of AAAI Symposium on Cross-Language Text and Speech Retrieval, March 1997, pp. 15-21.
[DPH98]Dumais, S., Platt, J., Heckerman, D., and Sahami, M., “Inductive Learning Algorithms and Representation for Text Categorization,” Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management (CIKM ‘98), 1998, pp. 148-155.
[IT95]Iwayama, M. and Tokunaga, T., “Cluster-Based Text Categorization: A Comparison of Category Search Strategies,” Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘95), Seattle, WA, July 1995, pp. 273-281
[JC94]Jing, Y. and Croft, W. B., “An Association Thesaurus for Information Retrieval,” Technical Report, Department of Computer Science, University of Massachusetts at Amherst, 1994.
[LC96]Larkey, L. and Croft, W., “Combining Classifiers in Text Categorization,” Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’96), Zurich, Switzerland, August 1996, pp. 289-297.
[LFM99]Letourneau, S., Famili, F., and Matwin, S., “Data Mining to Predict Aircraft Component Replacement,” IEEE Intelligen Systems, Vol. 14, No. 6, November/December 1999, pp. 59-66.
[LH98]Lam, W. and Ho, C. Y., “Using A Generalized Instance Set for Automatic Text Categorization,” Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘98), 1998, pp. 91-89.
[LL90]Landauer, T. K. and Littman, M. L., “Full Automatic Cross-Language Document Retrieval Using Latent Semantic Indexing,” Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research, Waterloo, Ontario, October 1990, pp. 31-38.
[LR94]Lewis, D. and Ringuette, M., “A Comparison of Two Learning Algorithms for Text Categorization,” Proceedings of Symposium on Document Analysis and Information Retrieval, 1994, pp. 81-93.
[MLW92]Masand, B., Linoff, G., and Waltz, D., “Classifying News Stories Using Memory Based Reasoning,” Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘92), 1992, pp. 59-64.
[MN98]McCallun, A. K. and Nigam, K., “A Comparison of Event Models for Naïve Bayes Text Classification,” Proceedings of AAAI-98 Workshop on Learning for Text Categorization, 1998,
[NGL97]Ng, H. T., Goh, W. B., and Low, K. L., “Feature Selection, Perception Learning, and A Usability Case Study for Text Categorization,” Proceedings of Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’97), 1997, pp. 67-73.
[NSI99]Nie, J. Y., Simard, M., Lsabeele, P., and Durand, R., “Cross-Language Information Retrieval based on Parallel Texts and Automatic Mining of Parallel Texts from the Web,” Proceedings of the ASM SIGIR, Berkeley, CA, 1999, pp. 74-81.
[O97]Oard, D. W., “Alternative Approaches for Cross-Language Text Retrieval,” Working Notes of AAAI-97 Spring Symposiums on Cross-Language Text and Speech Retrieval, pp. 131-139.
[OD96]Oard, D. W. and Dorr, B. J., “A Survey of Multilingual Text Retrieval,” UMIACS-TR96-19 C-TR-3815, 1996.
[S70]Salton, G., “Automatic Processing of foreign language documents,” Journal of the American Society for Information Science, Vol. 21, pp. 187-194, 1970.
[SB90]Salton, G. and Buckley, C., “Improving the Retrieval Performance by Relevance Feedback,” Journal of the American Society for Information Science, Vol. 41, 1990, pp. 288-297.
[SHP95]Schutze, H., Hull, D. A., and Pedersen, J. O., “A Comparison of Classifiers and Document Representations for the Routing Problem,” Proceedings of the 18st International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘95), 1995, pp. 229-237.
[V86]Voorhees, E. M., “Implementing Agglomerative Hierarchical Clustering Algorithms for Use in Document Retrieval,” Information Processing and Management, Vol.22, 1986, pp. 465-476.
[V93]Voutilainen, A., “Nptool: A Detector of English Noun Phrases,” Proceedings of Workshop on Very Large Corpora, Ohio, June 1993, pp. 48-57.
[WAD99]Weiss, S. M., Apte, C., Damerau, F. J., Johnson, D. E., Oles, F. J., Goetz, T., and Hampp, T., “Maximizing Text-Mining Performance,” IEEE Intelligence Systems, Vol. 14, No. 4, July/August 1999, pp. 63-69.
[WBO00]Wei, J., Bressan, S., and Ooi, B. C., “Mining Term Association Rules for Automatic Global Query Expansion: Methodology and Preliminary Results,” Proceedings of the First International Conference on Web Information Systems Engineering, 2000, pp. 366-373.
[WHD02]Wei, C., Hu, P., and Dong, Y. X., “Managing Document Categories in E-Commerce Environments: An Evolution-Based Approach,” European Journal of Information Systems, September 2002, pp. 208-222.
[WPS03]Wei, C., Piramuthu, S. and Shaw, M. J., “Knowledge Discovery and Data Mining,” Chapter 41 in Handbook of Knowledge Management, Vol. 2, C. W. Holsapple (Ed.), Springer-Verlag, Berlin, Germany, 2003, pp.157-189.
[WPW95]Wiener, W., Pedersen, J. O., and Weigend, A. S., “A Neural Network Approach to Topic Spotting,” Proceedings of the 4th Annual Symposium on Document Analysis and Information Retrieval (SDAIR ’95), Las Vegas, NV, 1995, pp. 317-332.
[Y94]Yang, Y., “Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval,” Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘94), Dublin, Ireland, July 1994, pp. 13-22.
[YC94]Yang, Y. and Chute, C. G., “An Example-Based Mapping Method for Text Categorization and Retrieval,” ACM Transaction on Information Systems, Vol. 12, No. 3, 1994, pp. 252-277.
[YL03]Yang, C. C. and Luk J., “Automatic Generation of English/Chinese Thesaurus Based on a Parallel Corpus in Laws,” Journal of the American Society for Information Science and Technology, Vol. 54, No. 7, 2003, pp. 671-682.
[YL99]Yang, Y. and Liu, X., “A Re-Examination of Text Categorization methods,” Proceedings of SIGIR ’99: 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1999, pp. 42-49.
[YP97]Yang, Y. and Pedersen, J. O., “A Comparative Study on Feature Selection in Text Categorization,” Proceedings of 14th International Conference on Machine Learning, 1997, pp. 412-420.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊