臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.82) 您好！臺灣時間：2026/02/20 08:28

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

蔡恆慈

研究生(外文):

Heng-tzu Tsai

論文名稱:

運用自我組織圖於多語言階層產生與比對之研究

論文名稱(外文):

A Multilingual Hierarchy Generation and Mapping Method Based on Self-Organizing Maps

指導教授:

楊新章

指導教授(外文):

Hsin-Chang Yang

學位類別:

碩士

校院名稱:

國立高雄大學

系所名稱:

資訊管理學系碩士班

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2009

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

多國語言資訊檢索、類神經網路、文件分群、自我組織映射圖

外文關鍵詞:

multilingual information retrieval、text clustering、neural network、self-organizing maps

相關次數:

被引用:1
點閱:348
評分:
下載:0
書目收藏:1

多國語言的文件隨著網際網路發展而迅速增加，多國語言資訊檢索技術的應用成為一個重要的研究議題。本文描述我們在發現多種語言文件上的知識所發展的一個方法。我們從光華雜誌中收集中文與英文的新聞資料，測試語料庫中各有1,000份中英文對照之文件。在本研究中，採用類神經網路中文件分群的方法，即自我組織圖；將多語言的文件加以群集並產生文件階層，最後透過階層比對之方法來協助我們找尋文件間之關聯。我們使用中、英文雙語平行語料庫來建構實驗以發現文件之間的關連性。本研究實驗顯示我們的方法可以獲取不同語言文件間之關係。

With the increasing amount of multilingual texts in the Internet, multilingual information retrieval has become an import research issue. This paper describes our work on developing a method for discovery of knowledge from multilingual documents. We collected English and Chinese news articles from Taiwan-panorama magazine. Our test corpus includes 1,000 pairs of Chinese-English parallel documents. In this study, we adopt a text clustering approach, which apply a neural network approach, namely the self-organizing maps (SOM), to cluster multilingual documents and reveal the hierarchical structure among them. Finally, we can discover relationships among multilingual documents by mapping multilingual hierarchies. We have conducted experiments to uncover relationships of documents based on Chinese-English bilingual parallel corpora. The experimental results show that our multilingual text mining approach may capture conceptual relationships among documents written in different languages.

致謝Ⅰ
中文摘要Ⅱ
AbstractⅢ
目錄Ⅳ
表目錄Ⅵ
圖目錄Ⅶ
第一章緒論1
1.1研究背景1
1.2研究動機3
1.3研究目的6
1.4研究架構7
第二章文獻探討8
2.1多國語言資訊檢索及多語文本探勘8
2.1.1 辭典為本方法8
2.1.2 索引典為本方法9
2.1.3 語料庫為本方法13
2.2文件分群演算法15
2.2.1 K-NN分類15
2.2.2 貝氏分類16
2.2.3 類神經網路分類16
2.3階層產生17
2.3.1 增長層級式自我組織圖18
2.3.2 機率方法19
2.3.3 自動類別階層產生19
2.4 階層比對及多語言的應用21
第三章研究方法與實驗設計22
3.1 文件前置處理23
3.1.1 斷詞23
3.1.2 關鍵字選擇25
3.1.3文件向量轉換25
3.2 文件分群27
3.2.1 SOM訓練與特徵圖產生27
3.2.2 自動類別階層產生32
3.2.3關聯發掘36
3.2.4多語言資訊檢索應用41
第四章實驗結果42
4.1 實驗步驟42
4.1.1 前置處理43
4.1.2 文件分群與階層產生46
4.1.3 階層比對51
4.2 實驗評估55
第五章結論與未來研究方向64
5.1 章節回顧64
5.2 結論65
5.3 未來研究方向65
參考文獻67

[1]Gordon, Jr. R. G. (2009) Ethnologue: Languages of the World, 16th edition, SIL International, Dallas.
[2]Weber, G. (1997) “Top Languages: The World’s 10 Most Influential Languages” in Language Today , Vol. 2.
[3]Internet World Stats, Top Ten Languages Used in the Web (2009). Available at: http://internetworldstats.com/stats7.htm.
[4]Feldman, R., Dagan, I., and Hirsh, H. (1998) “Mining text using keyword distributions,” Journal of Intelligent Information Systems, Vol. 10, pp. 281-300.
[5]Kaski, S., Honkela, T., Lagus, K. and Kohonen, T. (1998) “WEBSOM-self-organizing maps of document collections,” Neurocomputing, Vol. 21, pp. 101-117.
[6]Kohonen, T. (1982) “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, Vol. 43, pp. 59-69.
[7]Oard, D. W. and Dorr, B. J. (1996) “A Survey of Multilingual Text Retrieval,” Technical Report UMIACS-TR-96-19, University of Maryland, Institute for Advanced Computer Studies.
[8]Ballesteros, L. and Croft, W. B. (1996) “Dictionary–based Methods for Cross-Lingual Information Retrieval,” Proceedings of the 7th International DEXA Conference on Database and Expert Systems Applications, pp. 791-801.
[9]Chen, H. H., Lin, C. C., and Lin, W. C. (2000) “Construction of a Chinese-English WordNet and Its Application to CLIR,” Proceedings of 5th International Workshop on Information Retrieval with Asian Languages, Hong Kong, pp. 189-196.
[10]Fluhr, C. (1995) “Survey of the State of the Art in Human Language Technology,” Center for Spoken Language Understanding, Oregon Graduate Institute, pp. 291-305.
[11]Tallving, M. and Nelson, P. (1990) “A question of international accessibility to Japanese databases,” In David I Raitt, editor, 14th International Online Information Meeting Proceedings, Oxford, Learned Information, pp. 423-437.
[12]McCarley, J. S. (1999) “Should we Translate the Documents or the Queries in Cross-Language Information Retrieval?” In Proceedings of the 37th Annual Meeting of the Association for Computation Linguistics, pp. 208-214.
[13]Hull, D. A. and Grefenstette, G. (1996) “Querying Across Languages: A Dictionary-based Approach to Multilingual Information Retrieval,” Proceedings of the 19th International Conference on Research and Development in Information Retrieval, pp. 49-57.
[14]Davis, M. W. (1997) “New Experiments in Cross-Language Text Retrieval at NMSU’s Computing Research Lab,” Proceedings of TREC 5.
[15]Thompson, P. and Dozier, C. (1997) “Name Searching and Information Retrieval,” Proceedings of Second Conference on Empirical Methods in Natural Language Processing, Providence, Rhode Island.
[16]Chen, H. H., Huang, S. J., Ding, Y. W., and Tsai, S. C. (1998) “Proper Name Translation in Cross-Language Information Retrieval,” Proceedings of 17th International Conference on Computational Linguistics and 36th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, pp. 232-236.
[17]Fellbaum, C. (1999) “Wordnet,” MIT Press.
[18]Suarez, A., Saiz-Noeda, M., and Palomar, M. (1999) “A Method of Restricted Knowledge Acquisition from Wordnet,” IEEE Third International Conference on Knowledge-Based Intelligent Information Engineeing Systems, Adelaide, Australia.
[19]蔡明月 (1991), 「線上資訊檢索：理論與實務」, 台灣學生書局, 第177頁。
[20]Salton, G. (1970) “Automatic Processing of Foreign Language Documents,” Journal of the American Society for Information Science, pp. 187-194.
[21]Chen, H. H., Kuo, J. J., and Su, T. C. (2003) “Clustering and Visualization in a Multi-Lingual Multi-Document Summarization System,” Proceedings of 25th European Conference on Information Retrieval Research, Lecture Notes in Computer Science, LNCS 2633, April 14-16, Pisa, Italy, pp. 266-280.
[22]Brown, R. D. (1996) “Example-Based Machine Translation in the Pangloss System,” Proceedings of the 16th International Conference on Computational Linguistics.
[23]Oard, D. W. and Dorr, B. J. (1996) “Evaluating Cross-Language Text Filtering Effectiveness,” In Proceedings of the Cross-Linguistic Multilingual Information Retrieval Workshop, Zurich, Switzerland, pp. 8-14.
[24]Salton, G. (1989) “Automatic Text Processing: the Transformation, Analysis, and Retrieval of Information by Computer,” Reading, MA: Addison-Wesley.
[25]Croft, W. B., Broglio, J., and Fujii, H. (1995) “Applications of multilingual text retrieval,” In Proceedings of the Twenty-Ninth Annual Hawaii International Conference on System Sciences, pp. 98-107.
[26]Chen, K. H. and Chen, H. H. (1994) “A Part-of-Speech-Based Alignment Algorithm,” Proceedings of 15th International Conference on Computational Linguistics, Kyoto, Japan, pp. 166-171.
[27]Davis, M. W. and Dunning, T. (1996) “A TREC Evaluation of Query Translation Methods for Multi-lingual Text Retrieval,” Proceedings of TREC-4.
[28]Sheridan, P. and Ballerini, J. P. (1996) “Experiments in Multilingual Information Retrieval Using the SPIDER System,” Proceedings of the 19th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 58-65.
[29]Yeh, C. H. and Chau, Rowena (2004) “Filtering Multilingual Web Content Using Fuzzy Logic and Self-Organizing Maps,” Neural Computing & Applications, Vol. 13, No. 2, pp. 140-148.
[30]Yeh, C. H. and Chau, Rowena (2004) “A multilingual text mining approach to web cross-lingual text retrieval,” Knowledge-Based Systems, Vol. 17, No. 5-6, pp. 219-227.
[31]Rauber, A., Dittenbach, M., and Merkl, D. (2001) “Towards Automatic Content-Based Organization of Multilingual Digital Libraries: An English, French and German View of the Russian Information Agency Nowosti News,” In: Proceedings of the Third All-Russian Scientific Conference "Digital Libraries: Advanced Methods And Technologies, Digital Collections" (RCDL01), Russia, pp. 11-13.
[32]Lee, C. H. and Yang, H. C. (2000) “Towards multilingual information discovery through a SOM based text mining approach,” in Proceedings of International Workshop on text and Web Mining, The Sixth Pacific Rim International Conference on Artificial Intelligence (PRICAI 2000), Melbourne, Australia, pp. 81-87.
[33]Lee, C. H. and Yang, H. C. (2003) “A Multilingual Text Mining Approach Based on Self-Organizing Maps,” Applied Intelligence, Vol. 18, No. 3, pp. 295-310.
[34]Yang, Y., and Liu, X. (1999) “A Re-Examination of Text Categorization Methods,” In Proceedings of SIGIR-99,22nd ACM International Conference on Research and Development in Information Retrieval (Berkeley, CA), pp.42-49.
[35]Joachims, T. (1997) “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the 14th International Conference on Machine Learning ICML97, pp. 143-151.
[36]Pazzani, M., amd Billsus, D. (1997) “Learning and Revising User Profiles: The Identification of Interesting Web Sites,” Machine Learning 27, Kluwer Academic Publishers, pp. 313-331.
[37]Merk, D. (1998) “Text classification with self-organizing maps: some lessons learned,” Neurocomputing, 21(1-3): pp. 61-77.
[38]Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paatero, V., and Saarela, A. (2000) “Self Organization of a Massive Document Collection,” IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, no. 3, pp. 574-585.
[39]Ultsch, (1992) “Self-organizing neural networks for visualization and classification,” in Information and Classification. Concepts, Methods and Application, O. Opitz, B. Lausen, and R. Klar, Eds., Studies in Classification, Data Analysis, and Knowledge Organization, Springer, Dortmund, Germany, pp. 307-313.
[40]Merkl, D. and Rauber, A. (1997) “Alternative ways for cluster visualization in self-organizing maps,” in Proceedings of the Workshop on Self-Organizing Maps (WSOM97), T. Kohonen, Ed., Espoo, Finland, pp. 106-111.
[41]Rauber, A. (1999) “LabelSOM: On the labeling of selforganizing maps,” in Proceedings of the International Joint Conference on Neural Networks (IJCNN'99), Washington, DC, pp. 10-16.
[42]Merkl, D. and Rauber, A. (1999) “Automatic labeling of selforganizing maps for information retrieval,” in Proceedings of the 6. International Conference on Neural Information Processing (ICONIP99), Perth, Australia, pp. 16-20.
[43]Miikkulainen, R. (1990) “Script recognition with hierarchical feature maps,” Connection Science, vol. 2, pp. 83-101.
[44]Alahakoon, D., Halgamuge, S. K., and Srinivasan, B. (2000) “Dynamic
self-organizing maps with controlled growth for knowledge discovery,” IEEE Transactions on Neural Networks, vol. 11, no. 3, pp. 601-614.
[45]Rauber, A., Merkl, D., and Dittenbach, M. (2002) “The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data,” IEEE Transactions on Neural Networks, Vol. 13, pp. 1331-1341.
[46]Yang, H. C., Lee, C. H. and Chen, D. W. (2008) "A Method for Multilingual Text Mining and Retrieval Using Growing Hierarchical Self-Organizing Maps." Accepted by Journal of Information Science.
[47]Hofmann, T. (1999) The Cluster-Abstraction Model: Unsupervised Learning of Topic Hierarchies from Text Data. In Proc. Int’l Joint Conf. on Artificial Intelligence (IJCAI 99), pp. 682-687.
[48]Yang, H. C. and Lee, C.H. (2005) "Automatic Category Theme Identification and Hierarchy Generation for Chinese Text Categorization." Journal of Intelligent Information Systems, Vol. 25, No. 1, pp. 47-67.
[49]Daudé, J. Padró, L. and Rigau, G. (1999) “Mapping Multilingual Hierarchies
Using Relaxation Labeling” In Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora.
[50]曾元顯 (1997) “「關鍵詞自動擷取技術之探討」,” 中國圖書館學會會訊, 第106期, 第26-29頁。
[51]許中川、陳景揆 (2001) “「探勘中文新聞文件」,” 中華民國資訊管理學會會報, Vol. 14(2), 第103-122頁。
[52]Chen, K. J., Bai, M. H., "Unknown Word Detection for Chinese by a Corpus-based Learning Method", International Journal of Computational linguistics and Chinese Language Processing, 1998, Vol.3, #1, pages 27-44.
[53]Porter, M. F. (1980) “An algorithm for suffix stripping,” Program, Vol. 14, No. 3, pp. 130-137.
[54]Salton, G., Wong, A., and Yang, C. S. (1975) “A Vector Space Model for Automatic Indexing,” Communications of the ACM, Vol. 18(11), pp.613-620.
[55]Weigend, A. S., Wiener, E. D., and Pedersen, J. O., “Exploiting Hierarchy in Text Categorization,” Information Retrieval, Vol. 1, issue 3, October 1999, pp. 193-216.

電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	應用Kano模式檢視台中市區公車服務品質
2.	應用自組織映射神經網路進行公司動態財務行為之體質檢定
3.	改善自我組織圖群聚分析效率,準確度與視覺化的新方法
4.	基於增長層級式SOM之自動影像註解方法
5.	應用增長層級式自我組織映射圖於多國語言資訊檢索

1.	[50]曾元顯 (1997) “「關鍵詞自動擷取技術之探討」,” 中國圖書館學會會訊, 第106期, 第26-29頁。
2.	[50]曾元顯 (1997) “「關鍵詞自動擷取技術之探討」,” 中國圖書館學會會訊, 第106期, 第26-29頁。

1.	基於增長層級式SOM之自動影像註解方法
2.	1,8-雙亞胺咔唑鋁和鎳錯化合物之合成與應用
3.	越南地區台資木製家具製造產業經營策略分析-以Y公司為例
4.	探討房仲業人員之知識分享、個人創造力對組織創新之影響
5.	金門地區公費醫事人員留任意願之研究
6.	電晶體壓阻與整合式微系統感測器
7.	應變技術應用於90奈米SOI金氧半場效電晶體特性分析與可靠度研究
8.	三國模型下之仿冒查緝：以市場進入協議
9.	資訊中心機率型資訊安全投資之分析
10.	銀與二氧化鈦奈米複合材料之新穎製程開發與光電性質研究
11.	台灣環境法制自主管制之研究
12.	FDI來源效應與經濟成長-跨國實證分析
13.	半線性橢圓方程中加權函數的影響
14.	非極性m-氮化鎵之異向特性研究
15.	中國與印度銀行之群體績效評估

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室