跳到主要內容

臺灣博碩士論文加值系統

(2600:1f28:365:80b0:8e11:74e4:2207:41a8) 您好!臺灣時間:2025/01/15 17:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:黃政偉
研究生(外文):Cheng-wei Hwang
論文名稱:具語句特徵選取能力的類神經網路文件分類器
論文名稱(外文):A Neural Network Document Classifier with Linguistic Feature Selection
指導教授:李漢銘李漢銘引用關係藍信彰藍信彰引用關係
指導教授(外文):Hahn-Ming LeeSanko H. Lan
學位類別:碩士
校院名稱:國立臺灣科技大學
系所名稱:電子工程系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:1999
畢業學年度:87
語文別:英文
論文頁數:85
中文關鍵詞:類神經網路資訊擷取特徵選取
外文關鍵詞:Neural networkInformatin retrievalFeature selection
相關次數:
  • 被引用被引用:17
  • 點閱點閱:690
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
在本篇論文中,將針對非結構化且為自然語句描述(linguistic description) 的文件,提出一套具語句特徵選取能力同時能處理多輸出類別的分類系統。本系統主要包含前端的特徵選取單元與後端的類神經網路分類單元。在前端特徵選取單元中,我們先從原始的文件中以文字處理(text processing)萃取出詞(term),再引用具有分析資訊量特性的熵函數(entropy),對詞作集中度(conformity)與廣度(uniformity)的分析,以篩選出資訊量高的詞來作為分類的依據。為降低輸入維度,我們加入了同義詞合併的機制,作法是藉由廣度分析為依據,利用模糊關係(fuzzy relation)算出詞與詞之間的相似度(similarity)矩陣而建構出同義詞詞庫,再將意義相似的詞合併。在後端的類神經網路分類單元中,我們採用已發展相當成熟的倒傳遞模組(back-propagation)進行學習與測試。由於我們處理的資料對象是屬於多類別輸出,且類別之間有著階層式的關係,因此我們以多個倒傳遞模組建構出相對應的階層式分類器。在我們的實驗中以業界電子型錄(electronic catalog)的產品描述文件為對象,測試結果已達到相當的準確度,可有效輔助人工分類,大幅縮減人力資源與工作時間。
In this paper, a neural network document classifier with linguistic feature selection and multi-category output is presented. The proposed classifier is capable of classifying documents that are unstructured and contain linguistic description. It consists of a feature selection unit and a hierarchical neural network classification unit. In feature selection unit, we extract terms from original documents by text processing, then we analyze the conformity and uniformity of each term by entropy function which is characterized to measure the significance. Terms with high significance will be selected as input features for the following classifiers. To reduce the input dimension, we perform a mechanism to merge synonyms. According to the uniformity analysis, we obtain a term similarity matrix by fuzzy relation operation and then construct a synonym thesaurus. As a result, synonyms can be grouped. In hierarchical neural network classification unit, we adopt the well-known back-propagation model to build this proper hierarchical classification unit. In our experiment, a product description database from an electronic commercial company is employed. The classification results have achieved a sufficient accuracy to aid artificial classification effectively; therefore, much manpower and working time can be saved.
Abstract in Chinese I
Abstract in English II
Acknowledgements in Chinese III
Contents IV
Index of Figures and Tables VI
Chapter 1 Introduction 1-1
1.1 Motivation 1-1
1.2 Problem Definition 1-2
1.3 Background 1-4
1.3.1 Back-Propagation learning algorithm 1-4
1.3.2 Text processing and term significance measuring 1-5
1.4 Overview of this Thesis 1-7
Chapter 2 System Architecture 2-1
2.1 Feature Selection Unit 2-3
2.1.1 Text processing 2-4
2.1.2 Conformity and uniformity 2-6
2.1.3 Synonym thesaurus 2-8
2.1.4 Selecting features and transferring input data 2-12
2.2 Hierarchical Neural Network Classification Unit 2-15
2.2.1 Hierarchical classifier 2-15
2.2.2 Back-propagation classifier 2-19
2.2.3 The learning procedure 2-20
2.3 Output Evaluation 2-23
Chapter 3 Experimental Results 3-1
3.1 Dimensionality Reduction by Feature Selection 3-2
3.2 Accuracy Measurements 3-3
3.3 Classification Results 3-6
3.3.1 Training phase 3-6
3.3.2 Testing phase 3-8
Chapter 4 Discussion 4-1
4.1 Learning Performance and Memory Requirement 4-1
4.2 Thresholds for Feature Selection 4-4
4.3 Incremental Learning 4-4
4.4 Noise Detection 4-5
4.5 Negative word 4-6
Chapter 5 Related Applications 5-1
5.1 Neural Network 5-2
5.2 Latent Semantic Indexing 5-3
5.3 Genetic Algorithm 5-4
5.4 Mining Association Rules 5-6
Chapter 6 Conclusion 6-1
References R-1
Appendix A-1
Appendix A Product Description Document and Code Book List A-1
Appendix B Francis and Kucera’s Stop-list A-5
Index of figures and tables
References
[Adlassnig and Scheithauer 89]
Adlassnig, K. P., and Scheithauer, W., “Performance evaluation of medical expert systems using ROC curves,” Computers and Biomedical Research, vol.22, pp.297-313, 1989
[Agrawal et al. 93]
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami, “Mining Association Rules between Sets of Items in Large Databases,” Proceedings of the ACM SIGMOD Conference on Management of Data, pp.207-206, Washington, D.C., May 1993.
[Andrews et al. 95]
Robert Andrews, Joachin Diederich, Alan B. Tickle, “Survey and Critique of Techniques for Extracting Rules from Trained Artificial Neural Networks,” Knowledge-Based Systems, vol.8, no.6, pp.373-389, December 1995.
[Bax 98]
Eric Bax, “Validation of average error rate over classifiers,” Pattern Recognition Letters, vol.19, issue:2, pp.127-132, February1998.
[Border et al. 96]
Andrei Z. Border, Steven C. Glassman, Mark S. Marnasse, “Syntactic Clustering of the Web,” 6th International WWW Conference, 1996
[Bhatia and Deogun 98]
Sanjiv K. Bhatia, Jitender S. Deogun, “Conceptual Clustering in Information Retrieval,” IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, vol.28, no.3, pp.427-436, June 1998.
[Cardie 97]
Claire Cardie, “Empirical Methods in Information Extraction,” AAAI, pp.65-79, WINTER 1997.
[Chen et al. 96]
Hsinchun Chen, Chris Schuffels, Richard Orwig, “Internet Categorization and Search: A Self-Organizing Approach,” Journal of Visual Communication and Image Representation, vol.7, no.1, March, pp88-102, 1996.
[D’Alessio et al. 98]
D''Alessio, S., Kershenbaum, A., Murray, K., and Schiaffino, R., “Hierarchical Text Categorization,” 3rd Conference on Empirical Methods in Natural Language Proceeding, June 1998.
[Deerwester et al. 90]
Scott Deerwester, Susan T. Dumais, Richard Harshman, “Indexing by Latent Semantic Analysis,” Journal of the American Society for Information Science, vol.41, no.6, pp.391-407, 1990.
[Faloutsos and Oard 95]
Christos Faloutsos, Douglas Oard, “A Survey of Information Retrieval and Filtering Methods,” Technical Report 3514, Department of Computer Science, University of Maryland, 1995.
[Filman and Pant 98]
Robert E. Filman and Sangam Pant, “Searching the Internet,” IEEE Internet Computing, July-August, pp21-23, 1998.
[Francis and Kucera 82]
Francis, W., and Kucera, H., “Frequency Analysis of English Usage,” New York, 1982.
[Friedberg 58]
Friedberg, R. M., “A leraning machine: Part I,” IBM Journal, vol.2, pp.2-23, 1958.
[Gudivada et al. 97]
Venkat N. Gudivada, Vijay V. Raghavan, William I. Grosky, Rajesh Kasanagottu, “Information Retrieval on the World Wide Web,” IEEE Internet Computing, pp.58-68, September-October 1997.
[Holz and Loew 98]
H.J. Holz, M.H. Loew, “Multi-class Classifier-Independent Feature Analysis,” Pattern Recognition Letters, vol.18, issue:11-163, pp.1219-1224, November 1998.
[Honkela et al. 98]
Timo Honkela, Samuel Kaski, Krista Lagus, Teuvo Kohonen, “WEBSOM — Self-Organization Maps of Document Collections,” Neurocomputing, volume 21, pp. 101-117, 1998.
[Honkela et al. 97]
Honkela, T. et al.,“ WEBSOM--Self-Organizing Maps of Document Collections,” Proceedings of WSOM''97, Espoo, Finland, June 1997.
[Hsu and Dung 94]
Chun-Nan Hsu, Ming-Tzung Dung, “Generating Finite-State Transducers for Semi-Structured Data Extraction from the Web,” Information Systems vol.19, no.4, pp.33-54, 1994.
[Jenkins et al. 98]
Charlotte Jenkins, Mike Jackson, Peter Burden, Jon Wallis, “Searching the World Wide Web: an Evaluation of Available Tools and Methodologies,” Information and Software Technology, vol.39, pp.985-994, 1998.
[Kimoto and Asakawa 90]
Kimoto, T., and Asakawa, K., “Stock market predication system with modular networks,” IJCNN-90, vol.1, pp.1-6, 1990.
[Klir and Yuan 95]
George J. Klir, Bo Yuan, “Fuzzy Sets and Fuzzy Logic: Theory and Applications,” Prentice Hall PTR, 1995.
[Kohonen 90]
Kohonen, T., “The self-organizing map,” Proc. IEEE, vol.78, no.9, pp.1464-1480, 1990.
[Lenat 83]
Douglas B. Lenat. Eurisko, “A program which learns new heuristics and domain concepts,” Artificial Intelligence, vol.21, 1983.
[Li 98]
Yanhong Li, “Toward a Qualitative Search Engine,” IEEE Internet Computing, pp.24-29, July-August 1998.
[Lin and Chen 96]
Chung-hsin Lin, Hsinchun Chen, “An Automatic Indexing and Neural Network Approach to Concept Retrieval and Classification of Multilingual (Chinese-English) Documents,” IEEE Transaction on Systems, Man, and Cybernetics-Part B: Cybernetics, vol.26, No.1, pp.75-88, February 1996.
[Lin et al. 98]
S.H. Lin, C.S. Shih, M.C. Chen, J.M. Ho, M.T. Ko, and Y.M. Huang,“ Extracting Classification Knowledge of Internet Documents: A Semantic Approach,” Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, Aug. 24-28, 1998.
[Lin et al. 96]
Chuan-Chuan Lin, Shou-Yi Tseng, Pei-Min Chen, “A Cluster-Based Document Retrieval Model based on Concept Networks and Fuzzy Sets,” Proceedings of 4th Chinese Management Association Conference on Academic and Experiment, pp.903-914, 1996.
[Lippmann 89]
Richard P. Lippmann, “Pattern Classification Using Neural Networks,” IEEE Communications Magazine, pp.47-64, November 1989.
[Lipschutz 64]
Lipschutz, S., “Set Theory and Related Topics,” New York, 1964.
[Luhn 57]
Luhn, H. P., “A Statistical Approach to Mechanized Encoding and Searching of Literary Information,” IBM Journal of Research and Development, vol.1, no.4, 1957.
[Minia and Williams 90]
Minia, A.A., and Williams, R. D., “Acceleration of Back-propagation through learning rate and momemtum adaptation,” IJCNN-90, vol.1, pp.676-679, 1990.
[Mizzaro 98]
Stefano Mizzaro, “How Many Relevances in Information Retrieval ?,” Interacting with Computers, vol.10, issue:3, pp.303-320, June 1998.
[Moore et al. 97]
Jerome Moore, Eui-Hong (Sam) Han, Daniel Boley, Maria Gini, Robert Gross, Kyle Hastings, George Karypis, Vipin Kumar, and Bamshad Mobasher, “Web Page Categorization and Feature Selection Using Association Rule and Principle Component Clustering,” Proceedings of 7th Workshop on Information Technologies and Systems (WITS’97), December 1997.
[Pandya and Macy 95]
Abhijit S. Pandya, Robert B. Macy, “Pattern Recognition with Neural Network in C++,” CRC Press, IEEE Press, 1995.
[Porter 80]
Porter, M. E., “Competitive Strategy: Techniques for Analyzing Industries and Competitors,” New York: Free Press, 1980.
[Rumelhart et al. 86]
Rumelhart, D. E., Hinton, G. E., and William R. J., “Learning Internal Representation by Error Propagation,” Parallel Distributed Processing, vol.1, MIT Press, 1986.
[Sarkar et al. 98]
Sarkar, M., Yegnanarayana, B., Khemani, D., “Backpropagation Learning Algorithm for Classification with Fuzzy Mean Square Error,” Pattern Recognition Letters, vol.19, issue:1, pp.43-51, May 1998.
[Salton 89]
Salton, G., “Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer,” Addison Wesley, 1989.
[Setiono and Liu 97]
Setiono, R., and Liu, H., “Neural network feature selector,” IEEE Transactions on Neural Networks, vol.8, no.3, pp.654-662, 1997.
[Shannon 48]
Shannon, C. E., “A Mathematical Theory of Communication,” Bell System Technical Journal, vol.27, pp.279-423, 1948.
[Tickle et al. 98]
Alan B. Tickle, Robert Andrews, Mostefa Golea, Joachim Diederich, “The Truth Will Come to Light: Directions and Challenges in Extracting the Knowledge Embedded Within Trained Artificial Neural Networks,” IEEE Transactions on Neural Networks, vol.9, no.6, pp.1057-1068, November 1998.
[Tveter 91].
Tveter D., “Better speed through integers,” AI Expert, pp.39-46, November 1990.
[William and Ricardo 92]
William B. Frakes, Ricardo Baeza-Yates, “Information Retrieval: Data Structures & Algorithms,” Prentice Hall PTR, 1992.
[Wulfekuhler and Punch 96]
Marilyn R. Wulfekuhler, William F. Punch, “Finding Salient Features for Personal Web Page Categories,” 6th International WWW Conference, 1996.
[Yang et al. 94]
Yiming Yang, Christopher G. Chute, Mayo Clinic, “An Example-Based Mapping Method for Text Categorization and Retrieval,” ACM Transaction on Information Systems, vol.12, no.3, pp.252-277, July 1994.
[Zadeh 71]
Zadeh, L. A., “Towards a theory of fuzzy systems,” Aspects of Networks and Systems Theory, New York, pp.469-490, 1971.
[Zurada 92]
Jacek M. Zurada, “Introduction to Artificial Neural System,” West, 1992.
Chinese References
[C1 98]
黃雲龍,「中文全文文件群集索引理論研究與實證」,圖書與資訊期刊,第24期,第44-68頁,民國87年二月。
[C2 98]
侯永昌、 楊雪花, 「以模糊理論和遺傳演算法為基礎的中文文件自動分類之研究」,模糊系統期刊,第四卷,第一期,第45-47頁,民國87年。
[C3 93]
楊允言,「文件自動分類及其相似性排序」,清華大學資訊科學研究所,碩士論文,民國82年六月。
[C4 92]
陳淑美,「財經新聞自動分類之研究」,台灣大學圖書館學研究所,碩士論文,民國81年12月。
[C5 98]
葉怡成,「類神經網路模式應用與實作」,儒林,民國87年一月。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 黃春興、干學平:〈制衡或權能區分?〉,《經濟論文》(台北),第25卷,第3期(1997年9月),頁375—406。
2. 張家銘、徐偉傑:〈全球化概念的發展:一個發展社會學脈絡的考察〉,《東吳社會學報》(台北),第8期(1999年3月),頁79—121。
3. 張光華:〈從個人網上交易到虛擬社區〉,《能力雜誌》(台北),第523期(1999年9月),頁124—129。
4. 郭冠甫:〈電子付款機制的興起與相關法律問題(下)〉,《資訊法務透析》(台北),第12卷,第2期(2000年2月),頁55—62。
5. 曹永煌:〈議價、下單、諮詢一次OK〉,《管理雜誌》(台北),第302期(1999年8月),頁80—81。
6. 黃上紡:〈選舉性經濟循環與經濟選票競逐〉,《美歐月刊》(台北),第11卷,第12期(1996年12月),頁134—157。
7. 湯宗益:〈電子商務時代的新遊戲規則〉,《管理雜誌》(台北),第289期(1998年7月),頁54—61。
8. 傅傳訓:〈電子商務課稅之問題〉,《財稅研究》(台北),第31卷,第1期(1999年1月),頁1—15。
9. 曹永煌:〈一網打盡點石成金〉,《管理雜誌》(台北),第304期(1999年10月),頁26—31。
10. 江偉平::〈電子簽章法草案簡介〉,《資訊法務透析》(台北),第11卷,第1期(1999年1月),頁53—59。
11. 郭冠甫:〈電子付款機制的興起與相關法律問題(上)〉,《資訊法務透析》(台北),第12卷,第1期(2000年1月),頁40—49。
12. 徐仁輝:〈制度變遷與美國預算改革〉,《東吳經濟商學學報》(台北),第24期(1999年3月),頁89—114。
13. 朱敬一:〈經濟學研究方法的解析與批判〉,《通識教育季刊》(新竹),第3卷,第1期(1996年3月),頁99—107。
14. 顧瑩華:〈東南亞諸國電子業的發展概況〉,《經濟前瞻》(台北)第66期(1999年11月),頁90—93。
15. 魏可銘:〈數位經濟時代下之政府角色與政策發展〉,《經濟情勢暨評論》(台北),第5卷,第2期(1999年9月),頁1—8。