跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.82) 您好!臺灣時間:2026/02/20 08:28
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:蔡恆慈
研究生(外文):Heng-tzu Tsai
論文名稱:運用自我組織圖於多語言階層產生與比對之研究
論文名稱(外文):A Multilingual Hierarchy Generation and Mapping Method Based on Self-Organizing Maps
指導教授:楊新章楊新章引用關係
指導教授(外文):Hsin-Chang Yang
學位類別:碩士
校院名稱:國立高雄大學
系所名稱:資訊管理學系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:81
中文關鍵詞:多國語言資訊檢索類神經網路文件分群自我組織映射圖
外文關鍵詞:multilingual information retrievaltext clusteringneural networkself-organizing maps
相關次數:
  • 被引用被引用:1
  • 點閱點閱:348
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
多國語言的文件隨著網際網路發展而迅速增加,多國語言資訊檢索技術的應用成為一個重要的研究議題。本文描述我們在發現多種語言文件上的知識所發展的一個方法。我們從光華雜誌中收集中文與英文的新聞資料,測試語料庫中各有1,000份中英文對照之文件。在本研究中,採用類神經網路中文件分群的方法,即自我組織圖;將多語言的文件加以群集並產生文件階層,最後透過階層比對之方法來協助我們找尋文件間之關聯。我們使用中、英文雙語平行語料庫來建構實驗以發現文件之間的關連性。本研究實驗顯示我們的方法可以獲取不同語言文件間之關係。
With the increasing amount of multilingual texts in the Internet, multilingual information retrieval has become an import research issue. This paper describes our work on developing a method for discovery of knowledge from multilingual documents. We collected English and Chinese news articles from Taiwan-panorama magazine. Our test corpus includes 1,000 pairs of Chinese-English parallel documents. In this study, we adopt a text clustering approach, which apply a neural network approach, namely the self-organizing maps (SOM), to cluster multilingual documents and reveal the hierarchical structure among them. Finally, we can discover relationships among multilingual documents by mapping multilingual hierarchies. We have conducted experiments to uncover relationships of documents based on Chinese-English bilingual parallel corpora. The experimental results show that our multilingual text mining approach may capture conceptual relationships among documents written in different languages.
致謝Ⅰ
中文摘要Ⅱ
AbstractⅢ
目錄Ⅳ
表目錄Ⅵ
圖目錄Ⅶ
第一章 緒論1
1.1研究背景1
1.2研究動機3
1.3研究目的6
1.4研究架構7
第二章 文獻探討8
2.1多國語言資訊檢索及多語文本探勘8
2.1.1 辭典為本方法8
2.1.2 索引典為本方法9
2.1.3 語料庫為本方法13
2.2文件分群演算法15
2.2.1 K-NN分類15
2.2.2 貝氏分類16
2.2.3 類神經網路分類16
2.3階層產生17
2.3.1 增長層級式自我組織圖18
2.3.2 機率方法19
2.3.3 自動類別階層產生19
2.4 階層比對及多語言的應用21
第三章 研究方法與實驗設計22
3.1 文件前置處理23
3.1.1 斷詞23
3.1.2 關鍵字選擇25
3.1.3文件向量轉換25
3.2 文件分群27
3.2.1 SOM訓練與特徵圖產生27
3.2.2 自動類別階層產生32
3.2.3關聯發掘36
3.2.4多語言資訊檢索應用41
第四章 實驗結果42
4.1 實驗步驟42
4.1.1 前置處理43
4.1.2 文件分群與階層產生46
4.1.3 階層比對51
4.2 實驗評估55
第五章 結論與未來研究方向64
5.1 章節回顧64
5.2 結論65
5.3 未來研究方向65
參考文獻67
[1]Gordon, Jr. R. G. (2009) Ethnologue: Languages of the World, 16th edition, SIL International, Dallas.
[2]Weber, G. (1997) “Top Languages: The World’s 10 Most Influential Languages” in Language Today , Vol. 2.
[3]Internet World Stats, Top Ten Languages Used in the Web (2009). Available at: http://internetworldstats.com/stats7.htm.
[4]Feldman, R., Dagan, I., and Hirsh, H. (1998) “Mining text using keyword distributions,” Journal of Intelligent Information Systems, Vol. 10, pp. 281-300.
[5]Kaski, S., Honkela, T., Lagus, K. and Kohonen, T. (1998) “WEBSOM-self-organizing maps of document collections,” Neurocomputing, Vol. 21, pp. 101-117.
[6]Kohonen, T. (1982) “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, Vol. 43, pp. 59-69.
[7]Oard, D. W. and Dorr, B. J. (1996) “A Survey of Multilingual Text Retrieval,” Technical Report UMIACS-TR-96-19, University of Maryland, Institute for Advanced Computer Studies.
[8]Ballesteros, L. and Croft, W. B. (1996) “Dictionary–based Methods for Cross-Lingual Information Retrieval,” Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pp. 791-801.
[9]Chen, H. H., Lin, C. C., and Lin, W. C. (2000) “Construction of a Chinese-English WordNet and Its Application to CLIR,” Proceedings of 5th International Workshop on Information Retrieval with Asian Languages, Hong Kong, pp. 189-196.
[10]Fluhr, C. (1995) “Survey of the State of the Art in Human Language Technology,” Center for Spoken Language Understanding, Oregon Graduate Institute, pp. 291-305.
[11]Tallving, M. and Nelson, P. (1990) “A question of international accessibility to Japanese databases,” In David I Raitt, editor, 14th International Online Information Meeting Proceedings, Oxford, Learned Information, pp. 423-437.
[12]McCarley, J. S. (1999) “Should we Translate the Documents or the Queries in Cross-Language Information Retrieval?” In Proceedings of the 37th Annual Meeting of the Association for Computation Linguistics, pp. 208-214.
[13]Hull, D. A. and Grefenstette, G. (1996) “Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval,” Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pp. 49-57.
[14]Davis, M. W. (1997) “New Experiments in Cross-Language Text Retrieval at NMSU’s Computing Research Lab,” Proceedings of TREC 5.
[15]Thompson, P. and Dozier, C. (1997) “Name Searching and Information Retrieval,” Proceedings of Second Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island.
[16]Chen, H. H., Huang, S. J., Ding, Y. W., and Tsai, S. C. (1998) “Proper Name Translation in Cross-Language Information Retrieval,” Proceedings of 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, pp. 232-236.
[17]Fellbaum, C. (1999) “Wordnet,” MIT Press.
[18]Suarez, A., Saiz-Noeda, M., and Palomar, M. (1999) “A Method of Restricted Knowledge Acquisition from Wordnet,” IEEE Third International Conference on Knowledge-Based Intelligent Information Engineeing Systems, Adelaide, Australia.
[19]蔡明月 (1991), 「線上資訊檢索:理論與實務」, 台灣學生書局, 第177頁。
[20]Salton, G. (1970) “Automatic Processing of Foreign Language Documents,” Journal of the American Society for Information Science, pp. 187-194.
[21]Chen, H. H., Kuo, J. J., and Su, T. C. (2003) “Clustering and Visualization in a Multi-Lingual Multi-Document Summarization System,” Proceedings of 25th European Conference on Information Retrieval Research, Lecture Notes in Computer Science, LNCS 2633, April 14-16, Pisa, Italy, pp. 266-280.
[22]Brown, R. D. (1996) “Example-Based Machine Translation in the Pangloss System,” Proceedings of the 16th International Conference on Computational Linguistics.
[23]Oard, D. W. and Dorr, B. J. (1996) “Evaluating Cross-Language Text Filtering Effectiveness,” In Proceedings of the Cross-Linguistic Multilingual Information Retrieval Workshop, Zurich, Switzerland, pp. 8-14.
[24]Salton, G. (1989) “Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer,” Reading, MA: Addison-Wesley.
[25]Croft, W. B., Broglio, J., and Fujii, H. (1995) “Applications of multilingual text retrieval,” In Proceedings of the Twenty-Ninth Annual Hawaii International Conference on System Sciences, pp. 98-107.
[26]Chen, K. H. and Chen, H. H. (1994) “A Part-of-Speech-Based Alignment Algorithm,” Proceedings of 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 166-171.
[27]Davis, M. W. and Dunning, T. (1996) “A TREC Evaluation of Query Translation Methods for Multi-lingual Text Retrieval,” Proceedings of TREC-4.
[28]Sheridan, P. and Ballerini, J. P. (1996) “Experiments in Multilingual Information Retrieval Using the SPIDER System,” Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 58-65.
[29]Yeh, C. H. and Chau, Rowena (2004) “Filtering Multilingual Web Content Using Fuzzy Logic and Self-Organizing Maps,” Neural Computing & Applications, Vol. 13, No. 2, pp. 140-148.
[30]Yeh, C. H. and Chau, Rowena (2004) “A multilingual text mining approach to web cross-lingual text retrieval,” Knowledge-Based Systems, Vol. 17, No. 5-6, pp. 219-227.
[31]Rauber, A., Dittenbach, M., and Merkl, D. (2001) “Towards Automatic Content-Based Organization of Multilingual Digital Libraries: An English, French and German View of the Russian Information Agency Nowosti News,” In: Proceedings of the Third All-Russian Scientific Conference "Digital Libraries: Advanced Methods And Technologies, Digital Collections" (RCDL01), Russia, pp. 11-13.
[32]Lee, C. H. and Yang, H. C. (2000) “Towards multilingual information discovery through a SOM based text mining approach,” in Proceedings of International Workshop on text and Web Mining, The Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000), Melbourne, Australia, pp. 81-87.
[33]Lee, C. H. and Yang, H. C. (2003) “A Multilingual Text Mining Approach Based on Self-Organizing Maps,” Applied Intelligence, Vol. 18, No. 3, pp. 295-310.
[34]Yang, Y., and Liu, X. (1999) “A Re-Examination of Text Categorization Methods,” In Proceedings of SIGIR-99,22nd ACM International Conference on Research and Development in Information Retrieval (Berkeley, CA), pp.42-49.
[35]Joachims, T. (1997) “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the 14th International Conference on Machine Learning ICML97, pp. 143-151.
[36]Pazzani, M., amd Billsus, D. (1997) “Learning and Revising User Profiles: The Identification of Interesting Web Sites,” Machine Learning 27, Kluwer Academic Publishers, pp. 313-331.
[37]Merk, D. (1998) “Text classification with self-organizing maps: some lessons learned,” Neurocomputing, 21(1-3): pp. 61-77.
[38]Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., and Saarela, A. (2000) “Self Organization of a Massive Document Collection,” IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 574-585.
[39]Ultsch, (1992) “Self-organizing neural networks for visualization and classification,” in Information and Classification. Concepts, Methods and Application, O. Opitz, B. Lausen, and R. Klar, Eds., Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Dortmund, Germany, pp. 307-313.
[40]Merkl, D. and Rauber, A. (1997) “Alternative ways for cluster visualization in self-organizing maps,” in Proceedings of the Workshop on Self-Organizing Maps (WSOM97), T. Kohonen, Ed., Espoo, Finland, pp. 106-111.
[41]Rauber, A. (1999) “LabelSOM: On the labeling of selforganizing maps,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), Washington, DC, pp. 10-16.
[42]Merkl, D. and Rauber, A. (1999) “Automatic labeling of selforganizing maps for information retrieval,” in Proceedings of the 6. International Conference on Neural Information Processing (ICONIP99), Perth, Australia, pp. 16-20.
[43]Miikkulainen, R. (1990) “Script recognition with hierarchical feature maps,” Connection Science, vol. 2, pp. 83-101.
[44]Alahakoon, D., Halgamuge, S. K., and Srinivasan, B. (2000) “Dynamic
self-organizing maps with controlled growth for knowledge discovery,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 601-614.
[45]Rauber, A., Merkl, D., and Dittenbach, M. (2002) “The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data,” IEEE Transactions on Neural Networks, Vol. 13, pp. 1331-1341.
[46]Yang, H. C., Lee, C. H. and Chen, D. W. (2008) "A Method for Multilingual Text Mining and Retrieval Using Growing Hierarchical Self-Organizing Maps." Accepted by Journal of Information Science.
[47]Hofmann, T. (1999) The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data. In Proc. Int’l Joint Conf. on Artificial Intelligence (IJCAI 99), pp. 682-687.
[48]Yang, H. C. and Lee, C.H. (2005) "Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization." Journal of Intelligent Information Systems, Vol. 25, No. 1, pp. 47-67.
[49]Daudé, J. Padró, L. and Rigau, G. (1999) “Mapping Multilingual Hierarchies
Using Relaxation Labeling” In Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
[50]曾元顯 (1997) “「關鍵詞自動擷取技術之探討」,” 中國圖書館學會會訊, 第106期, 第26-29頁。
[51]許中川、陳景揆 (2001) “「探勘中文新聞文件」,” 中華民國資訊管理學會會報, Vol. 14(2), 第103-122頁。
[52]Chen, K. J., Bai, M. H., "Unknown Word Detection for Chinese by a Corpus-based Learning Method", International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44.
[53]Porter, M. F. (1980) “An algorithm for suffix stripping,” Program, Vol. 14, No. 3, pp. 130-137.
[54]Salton, G., Wong, A., and Yang, C. S. (1975) “A Vector Space Model for Automatic Indexing,” Communications of the ACM, Vol. 18(11), pp.613-620.
[55]Weigend, A. S., Wiener, E. D., and Pedersen, J. O., “Exploiting Hierarchy in Text Categorization,” Information Retrieval, Vol. 1, issue 3, October 1999, pp. 193-216.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top