跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.110) 您好!臺灣時間:2025/09/27 01:11
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳丁溫
研究生(外文):Ding-Wen Chen
論文名稱:應用增長層級式自我組織映射圖於多國語言資訊檢索
論文名稱(外文):A Multilingual Information Retrieval Approach Based on Growing Hierarchical Self-Organizing Maps
指導教授:吳美宜吳美宜引用關係
指導教授(外文):Mei-Yi Wu
學位類別:碩士
校院名稱:長榮大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:中文
論文頁數:57
中文關鍵詞:多國語言資訊檢索文件分群類神經網路增長層級式自我組織映射圖
外文關鍵詞:multilingual information retrievaltext clusteringneural networkgrowing hierarchical self-organizing maps
相關次數:
  • 被引用被引用:0
  • 點閱點閱:142
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著網際網路上多國語言文件的增加,多國語言資訊檢索技術的應用成為一個重要的研究課題。本文描述我們在發掘多種語言文件上的知識所發展的一個方法。我們從光華雜誌中收集中文與英文的新聞資料,測試語料庫中各有976份中英雙語文件。
在本研究中,我們採用一類神經網路中文件分群的方法,即增長層級式自我組織映射圖,來協助我們發現多國語言文件之關聯。我們使用中英雙語平行語料庫來建構實驗以發掘文件間之關連性。本研究實驗顯示我們的方法可以獲取不同語言文件間之關係。
With the increasing amount of multilingual texts in the Internet, multilingual information retrieval has become an important research issue. This paper describes our work on developing a method for discovery of knowledge from multilingual documents. We collected English and Chinese news articles from the Taiwan-panorama magazine. Our test corpus includes 976 pairs of Chinese-English parallel documents.
In this study, we adopt a text clustering approach, which apply a neural network approach, namely the growing hierarchical self-organizing maps (GHSOM), to help us discovering relationships among multilingual documents. We have conducted experiments to uncover relationships of documents based on Chinese-English bilingual parallel corpora. The experimental results show that our multilingual text mining approach may capture conceptual relationships among documents written in different languages.
致謝....................................I
中文摘要................................II
Abstract................................III
目錄....................................IV
表目錄..................................V
圖目錄..................................VI
第一章 緒論.............................1
1.1 研究背景............................1
1.2 研究動機............................2
1.3 問題領域............................4
1.4 論文架構............................4
第二章 相關研究探討.....................5
2.1 多國語言資訊檢索....................5
2.1.1 辭典為本方法......................5
2.1.2 索引典為本方法....................7
2.1.3 語料庫為本方法....................8
2.2 文件分群演算法......................12
2.2.1 k-Nearest Neighbor Classifier.....12
2.2.2 Na??ve Bayesian Classifier........13
2.2.3 Neural Network Classifier.........13
第三章 研究方法與實驗設計...............16
3.1 文件前置處理........................17
3.1.1 斷詞..............................17
3.1.2 特徵選取..........................18
3.1.3 向量空間模型......................21
3.2 文件分群............................22
3.2.1 自我組織映射圖(SOM)...............22
3.2.2 增長層級式自我組織映射圖(GHSOM)...25
第四章 實驗結果.........................30
4.1 實驗步驟............................30
4.1.1 前置處理..........................31
4.1.2 文件分群..........................34
4.1.3 檢索介面..........................37
4.2 實驗評估............................43
第五章 結論與未來研究方向...............49
5.1 章節回顧............................49
5.2 結論................................50
5.3 未來研究方向........................50
參考文獻................................52
[1] Gordon, R. G. (2005) “Ethnologue: Languages of the World,” Fifteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/.
[2] Korfhage, R. R. (1997) “Information Storage and Retrieval,” John Wiley & Sons.
[3] Internet Usage World Stats, (2007) Top ten languages used in the web. from http://www.internetworldstats.com/stats7.htm
[4] Oard, D. W. and Dorr, B. J. (1996) “A Survey of Multilingual Text Retrieval,” Technical Report UMIACS-TR-96-19, University of Maryland, Institute for Advanced Computer Studies.
[5] Ballesteros, L. and Croft, W. B. (1996) “Dictionary–based Methods for Cross-Lingual Information Retrieval,” Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pp. 791-801.
[6] Chen, H. H., Lin, C. C., and Lin, W. C. (2000) “Construction of a Chinese-English WordNet and Its Application to CLIR,” Proceedings of 5th International Workshop on Information Retrieval with Asian Languages, Hong Kong, pp. 189-196.
[7] Fluhr, C. (1995) “Survey of the State of the Art in Human Language Technology,” Center for Spoken Language Understanding, Oregon Graduate Institute, pp. 291-305.
[8] Tallving, M. and Nelson, P. (1990) “A question of international accessibility to Japanese databases,” In David I Raitt, editor, 14th International Online Information Meeting Proceedings, Oxford, Learned Information, pp. 423-437.
[9] McCarley, J. S. (1999) “Should we Translate the Documents or the Queries in Cross-Language Information Retrieval?” In Proceedings of the 37th Annual Meeting of the Association for Computation Linguistics, pp. 208-214.
[10] Hull, D. A. and Grefenstette, G. (1996) “Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval,” Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pp. 49-57.
[11] Davis, M. W. (1997) “New Experiments in Cross-Language Text Retrieval at NMSU’s Computing Research Lab,” Proceedings of TREC 5.
[12] Ballesteros, L. and Croft, W. B. (1997) “Phrasal Translation and Query Expansion Techniques for Cross-Language Information Retrieval,” Working Notes of AAAI-97 Spring Symposiums on Cross-Language Text and Speech Retrieval, pp. 1-8.
[13] Min, J., Sun, L., and Zhang, J. (2005) “ISCAS in English-Chinese CLIR at NTCIR-5,” Proceedings of NTCIR-5.
[14] Gey, F. C. (2005) “How similar are Chinese and Japanese for Cross-Language Information Retrieval?” Proceedings of NTCIR-5.
[15] Thompson, P. and Dozier, C. (1997) “Name Searching and Information Retrieval,” Proceedings of Second Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island.
[16] Chen, H. H., Huang, S. J., Ding, Y. W., and Tsai, S. C. (1998) “Proper Name Translation in Cross-Language Information Retrieval,” Proceedings of 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, pp. 232-236.
[17] Peters, C. and Picchi, E. (1997) “Across Languages, Across Cultures Issues in multilinguality and Digital Libraries,” D-Lib Magazine.
[18] Fellbaum, C. (1999) “Wordnet,” MIT Press.
[19] Suarez, A., Saiz-Noeda, M., and Palomar, M. (1999) “A Method of Restricted Knowledge Acquisition from Wordnet,” IEEE Third International Conference on Knowledge-Based Intelligent Information Engineeing Systems, Adelaide, Australia.
[20] Salton, G. (1970) “Automatic Processing of Foreign Language Documents,” Journal of the American Society for Information Science, pp. 187-194.
[21] Chen, H. H., Kuo, J. J., and Su, T. C. (2003) “Clustering and Visualization in a Multi-Lingual Multi-Document Summarization System,?羾roceedings of 25th European Conference on Information Retrieval Research, Lecture Notes in Computer Science, LNCS 2633, April 14-16, Pisa, Italy, pp. 266-280.
[22] Brown, R. D. (1996) “Example-Based Machine Translation in the Pangloss System,” Proceedings of the 16th International Conference on Computational Linguistics.
[23] Oard, D. W. and Dorr, B. J. (1996) “Evaluating Cross-Language Text Filtering Effectiveness,” In Proceedings of the Cross-Linguistic Multilingual Information Retrieval Workshop, Zurich, Switzerland, pp. 8-14.
[24] Salton, G. (1989) “Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer,” Reading, MA: Addison-Wesley.
[25] Croft, W. B., Broglio, J., and Fujii, H. (1995) “Applications of multilingual text retrieval,” In Proceedings of the Twenty-Ninth Annual Hawaii International Conference on System Sciences, pp. 98-107.
[26] Chen, K. H. and Chen, H. H. (1994) “A Part-of-Speech-Based Alignment Algorithm,” Proceedings of 15th International Conference on Computational Linguistics, Kyoto, pp. 166-171.
[27] Davis, M. W. and Dunning, T. (1996) “A TREC Evaluation of Query Translation Methods for Multi-lingual Text Retrieval,” Proceedings of TREC-4.
[28] Sheridan, P. and Ballerini, J. P. (1996) “Experiments in Multilingual Information Retrieval Using the SPIDER System,” Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 58-65.
[29] Kohonen, T. (1982) “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, Vol. 43, pp. 59-69.
[30] Yeh, C. H. and Chau, Rowena (2004) “Filtering Multilingual Web Content Using Fuzzy Logic and Self-Organizing Maps,” Neural Computing & Applications, Springer-Verlag, London UK, ISBN/ISSN: 0941-0643, pp. 140-148.
[31] Yeh, C. H. and Chau, Rowena (2004) “A multilingual text mining approach to web cross-lingual text retrieval,” Knowledge-Based Systems, Elsevier Science, Amsterdam Netherlands, ISBN/ISSN: 0950-7051, pp. 219-227.
[32] Lee, C. H. and Yang, H. C. (2000) “Towards multilingual information discovery through a SOM based text mining approach,” in Proceedings of International Workshop on Text and Web Mining, The Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000), Melbourne, Australia, pp. 81–87.
[33] Lee, C. H. and Yang, H. C. (2003) “A Multilingual Text Mining Approach Based on Self-Organizing Maps,” In Applied Intelligence: Vol. 18, No. 3, pp. 295-310. (SCI)(EI)
[34] Rauber, A., Merkl, D., and Dittenbach, M. (2002) “The growing hierarchical selforganizing map: exploratory analysis of high-dimensional data,” IEEE Transactions on Neural Networks, Vol. 13, pp. 1331-1341.
[35] Rauber, A., Dittenbach, M., and Merkl, D. (2001) “Towards Automatic Content-Based Organization of Multilingual Digital Libraries: An English, French and German View of the Russian Information Agency Nowosti News,” In: Proceedings of the Third All-Russian Scientific Conference "Digital Libraries: Advanced Methods And Technologies, Digital Collections" (RCDL01), Russia, pp. 11-13.
[36] Pazzani, M., Muramatsu, J., and Billsus, D. (1996) “Syskill & Webert : Identifying Interesting Web Sites,” AAAI Spring Symposium on Machine Learning in Information Access, Standford, March 1996
and Proceedings of the Thirteenth National Conference on Artificial Intelligence AAAI 96, pp. 54-61.
[37] Yang, Y., and Liu, X. (1999) “A Re-Examination of Text Categorization Methods,” In Proceedings of SIGIR-99,22nd ACM International Conference on Research and Development in Information Retrieval (Berkeley, CA), pp.42-49.
[38] Joachims, T. (1997) “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the 14th International Conference on Machine Learning ICML97, pp. 143-151.
[39] Pazzani, M., and Billsus, D. (1997) “Learning and Revising User Profiles: The Identification of Interesting Web Sites,” Machine Learning 27 , Kluwer Academic Publishers, pp. 313-331.
[40] Merk, D. (1998) “Text classification with self-organizing maps: some lessons learned,” Neurocomputing, 21(1-3): pp. 61-77.
[41] Kohonen, T., Kaski, S., Lagus, K., Saloj?鑼vi, J., Honkela, J., Paatero, V., and Saarela, A. (2000) “Self Organization of a Massive Document Collection,” IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 574-585.
[42] Ultsch, (1992) “Self-organizing neural networks for visualization and classification,” in Information and Classification. Concepts, Methods and Application, O. Opitz, B. Lausen, and R. Klar, Eds., Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Dortmund, Germany, pp. 307-313.
[43] Merkl, D. and Rauber, A. (1997) “Alternative ways for cluster visualization in self-organizing maps,” in Proceedings of the Workshop on Self-Organizing Maps (WSOM97), T. Kohonen, Ed., Espoo, Finland, pp. 106-111.
[44] Rauber, A. (1999) “LabelSOM: On the labeling of selforganizing maps,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN''99), Washington, DC, pp. 10-16.
[45] Merkl, D. and Rauber, A. (1999) “Automatic labeling of selforganizing maps for information retrieval,” in Proceedings of the 6. International Conference on Neural Information Processing (ICONIP99), Perth, Australia, pp. 16-20.
[46] Miikkulainen, R. (1990) “Script recognition with hierarchical feature maps,” Connection Science, vol. 2, pp. 83-101.
[47] Alahakoon, D., Halgamuge, S. K., and Srinivasan, B. (2000) “Dynamic self-organizing maps with controlled growth for knowledge discovery,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 601-614.
[48] Salton, G., Wong, A., and Yang, C. S. (1975) “A Vector Space Model for Automantic Indexing,” Communications of the ACM, Vol. 18(11), pp. 613-620.
[49] Salton, G. (1989) “Automatic Text Processing : the Transformation, Analysis, and Retrieval of Information by Computer,” Reading, MA: Addison-Wesley.
[50] Porter, M. F. (1980) “An algorithm for suffix stripping,” Program, Vol. 14, No. 3, pp. 130-137.
[51] Ricardo, B. Y. and Berthier, R. N. (1999) “Modern Information Retrieval,” Addison-Wesley.
[52] 曾元顯 (1997)“「關鍵詞自動擷取技術之探討」,?苳什篧炷?館學會會訊, 第106期, 第26-29頁。
[53] 許中川、陳景揆 (2001)“「探勘中文新聞文件」,?苳今堨蟆篣穈T管理學會會報, Vol. 14(2), 第103-122頁。
[54] 陳文華、施人英、吳壽山 (2004) “「探討文字採掘技術在管理者知識地圖之應用」,?? 中山管理評論,Vol. 12(6),第35-64頁。
[55] 梁家豪、林福仁 (2004) “「結合事件主軸摘要之議題回顧機制於新聞報導應用」,?訄?立中山大學資訊管理學系碩士論文。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top