跳到主要內容

臺灣博碩士論文加值系統

(44.192.44.30) 您好!臺灣時間:2024/07/25 08:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:呂靜婷
研究生(外文):Ching-Ting Lu
論文名稱:一個以卡方為基礎的文件多重分類方法
論文名稱(外文):Multi-label Text Categorization Using a Chi-Square Based Method
指導教授:陸承志陸承志引用關係
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:53
中文關鍵詞:文件多重分類相關係數權重矩陣倒卡方分類器集中度
外文關鍵詞:Multi-label Text CategorizationCorrelated CoefficientWeighted MatrixInverse Chi-square ClassifierConformity
相關次數:
  • 被引用被引用:1
  • 點閱點閱:217
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
本研究提出一個以inverse chi-square分類器為基礎的方法,這個方法包含一個為各類別挑選特徵詞的流程,以及建立了一個詞彙-類別相關的權重矩陣,為測試文件找尋對應於各類別的特徵權重。再以inverse chi-square分類器計算出文件在各類別的指標值,作為分類之依據。本研究採用DF (Document Frequency)、CC (Correlated Coefficient)與ICF (Inverted Conformity Frequency) 三種門檻值分別為不同類別篩選出不同的特徵詞。最後以 Reuters 21578 資料集中文件篇數前10大類別的實驗結果顯示,本方法的Precision、 Recall 和 F1-measure 分別可達 87%, 98% 和92%左右,和多重分類研究中著名的Boostexter的效能表現相當。
This study presents a based method to multi-label text categorization term-category weighted matrix. This method uses an inverse chi-square classifier to calculate an indicator value with respect to each category under consideration based the testing document’s feature weights represented by correlation coefficient.
We use three thresholds including DF (Document Frequency), CC (Correlated Coefficient) and ICF (Inverted Conformity Frequency), to extract different category’s relevant terms. Finally, we conduct experiments on the top 10 categories of Reuters 21578. The experimental results show that the Precision, Recall, F1-measure can reach 87%, 98%, 92%, respectively. Our method is shown to be comparable to the famous multi-label method, Boostexter.
書名頁 i
論文口試委員審定書ii
授權書 iii
中文摘要 iv
英文摘要 v
誌謝 vi
目錄 vii
表目錄 ix
圖目錄 x
第一章 緒論 1
1.1 研究動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章 文獻探討 3
2.1 文件分類 ( Text Categorization ) 3
2.1.1 文件分類簡介 3
2.1.2 文件分類問題定義 4
2.2 多重分類的方法 5
2.2.1 Support Vector Machines (SVM) 5
2.2.2 K-nearest Neighbor Classifier (KNN) 6
2.2.3 Neural Networks (NN) 7
2.2.4 Decision Tree (DT) 9
2.2.5 Boosting algorithm 10
2.2.6 其他多重分類方法 11
2.3 多重分類的衡量指標 12
第三章 多重分類的流程 15
3.1 資料前處理 (Data Preprocessing) 17
3.2 訓練流程 17
3.3 分類流程 22
3.4 多重分類的效能指標 25
第四章 實驗評估 28
4.1 實驗資料集 28
4.2 實驗設計 30
4.3 實驗結果與分析 33
4.3.1 nCC 與 ICF 門檻值的實驗 33
4.3.2 不同 i_value 門檻值的實驗 37
4.3.3 跨類重疊度的實驗 38
4.4 Boostexter 實驗比較 46
第五章 結論與未來展望 48
參考文獻 49
1.葉怡成,「應用類神經網路」,儒林圖書有限公司,台北,1997
2.Chang, Y. C., Chen, S. M. and Liau, C. J. “Multilabel Text Categorization Based on a new Linear Classifier Learning Method and a Category-Sensitive Refinement Method,” Expert Systems with Applications(34:3) 2008, pp:1948-1953.
3.Chen, Y. L., Hsu, C. L. and Chou, S. C. “Constructing a Multi-valued and Multi-labeled Decision Tree,” Expert Systems with Applications (25:2) 2003, pp:199-209.
4.Chou, S. and Hsu, C. L. “MMDT: a Multi-valued and Multi-labeled Decision Tree Classifier for Data Mining,” Expert Systems with Applications (28:4) 2005, pp: 799-812.
5.Dalton, J. and Deshmane, A. “Artificial neural networks,” IEEE Potentials, Vol. (10:2)1991, pp: 33-36.
6.Freund, Y. & Schapire, R.E. “A decision-theoretic generalization of on-line learning and an application to boosting,” Journal of Computer and System Sciences, (55:1)1997, pp: 119–139.
7.Freund, Y. & Schapire, R.E. “Experiments with a new boosting algorithm,” In Machine Learning: Proceedings of the Thirteenth International Conference , 1996, pp: 148-156.
8.Han, J., & Kamber, M. Data mining: Concepts and techniques. San Francisco, CA: Morgan Kaufmann. 2001
9.Lee, L. H. and Luh, C. J. “Classifying Pornographic Web Pages Using a Chi-Square Based Statistics Method,” Journal of Information Management (14:2), 2007, pp: 225 -246.
10.Lewis, D.D. and Ringuette, M. “A comparison of two learning algorithms for text categorization,” In Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SIGIR’94), 1994, pp. 81-93.
11.Lin, J. H. and Hu, T. F. “Fuzzy Correlation and Support Vector Learning Approach to Multi-Categorization of Documents,” IEEE International Conference on Systems, Man, and Cybernetics 2004, pp: 3735-3740.
12.Liu, Y. Yang, Y. & Carbonell, J. “Boosting to Correct the Inductive Bias for Text Classification,” ACM International Conference on Information and Knowledge Management (CIKM’02), 2002, pp: 348-355.
13.Petrovskiy, M. “Paired Comparisons Method for Solving Multi-label Learning Problem,” In Proceedings of the 6th International Conference on Hybrid Intelligent Systems 2006, pp: 42-47.
14.Popa, I. S., Zeitouni, K., Gardarin, G., Nakache, D. and Metais, E. “Text Categorization for Multi-label Documents and Many Categories.” In Proceeding of the 12th IEEE Symposium on Computer-Based Medical Systems 2007, pp: 421-426.
15.Quinlan, J. R. “Induction of decision trees,” Machine Learning, (1:1)1986, pp: 81–106.
16.Robinson, G. “A Statistical Approach to the Spam Problem,” Linux Journal, (2003:107) 2003, pp: 1-9.
17.Santini, M. “Zero, single, or multi? Genre of web pages through the users'' perspective,” Information Processing & Management, (44:2) 2008, pp: 702-737.
18.Schapire, R. E. and Singer, Y. “BoosTexter: A Boosting-based System for Text Categorization,” Machine Learning (39:2-3) 2000, pp: 135-168.
19.Schapire, R. E. “The boosting approach to machine learning: An overview,” In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.
20.Sebastiani, F. “Machine Learning in Automated Text Categorization,” ACM Computing Surveys (34:1) 2002, pp: 1-47.
21.Shannon, C. E. “Communication Theory of Secrecy Systems,” Bell Systems Technical Journal (28:4) 1949, pp: 656-715. .
22.Tseng, Y.H, Lin, C.J., Lin, Y.I., “Text mining techniques for patent analysis, ” Information Processing and Management (43:5) 2007, pp: 1216-1247.
23.Wang, T. Y. and Chiang, H. M. “Fuzzy Support Vector Machine for Multi-class Text Categorization,” Information Processing & Management (43:4) 2007, pp: 914-929.
24.Yang, Y. “Expert network: Effective and efficient learning from human decisions in text categorization and retrieval,” In 17th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’94), 1994, pp: 13-22.
25.Yang, Y. and Liu, X. “A re-examination of text categorization methods.” In Proc. of the 22nd Annual International ACM SIGIR Conference 1999, pp: 42–49.
26.Zhang, M. L. and Zhou, Z. H. “Multi-Label Neural Networks with Applications to Functional Genomics and Text Categorization.” IEEE Transaction on Knowledge and Data Engineering (18:10) 2006, pp: 1338-1351.
27.Zhang, M.-L., and Zhou, Z.-H. “A K-Nearest Neighbor Based Algorithm for Multi-label Classification,” IEEE International Conference on Granular Computing, July 2005, pp: 718-721.
28.Zheng, Z. and Srihari, R. “Optimally combining positive and negative features for text categorization,” In Proceedings of the ICML''03 Workshop on Learning from Imbalanced Date Sets 2003
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top