跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.138) 您好!臺灣時間:2025/12/07 17:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳佳良
研究生(外文):Chen jia-liang
論文名稱:以SVM為基礎的文件階層式多元分類
論文名稱(外文):Hierarchical Multi-class Text Classification Using Support Vector Machines
指導教授:陸承志陸承志引用關係
指導教授(外文):Cheng-Jye Luh
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:中文
論文頁數:42
中文關鍵詞:階層分類多元分類SVM特徵詞挑選
外文關鍵詞:Multi-class ClassificationHierarchical ClassificationSupport Vector MachinesFeature SelectionText Categorization
相關次數:
  • 被引用被引用:1
  • 點閱點閱:1619
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
本研究利用企業文件類別的階層架構,建立由多個多元分類器所組成的階層分類模型,以便讓文件依照類別階層由上往下逐步的分類。我們使用的多元分類方法是以SVM分類器搭配one-against-one 分類方法。針對這些分類器,我們採用DF(Document Frequency)搭配CC(Correlated Coefficient)兩種門檻值來篩選特徵詞。本研究以一組企業技術文件和一組大陸新聞資料兩組性質不同文件資料集進行測試,實驗結果顯示,本階層式分類器在兩組文件資料集中都有良好的分類表現,並且比非階層式的分類方法更能節省分類時間。
This study presents a hierarchical multi-class text classification framework based on the characteristics of enterprise documents. The multi-class classifiers are based on Support Vector Machines using an one-against-one approach. The features used by each classifier are selected using DF (Document Frequency) and CC (Correlated Coefficient). We conducted experiments on two different datasets; one contains enterprise documents from IC a local equipment manufacture and the other contains mainland china news. The experimental results show that our proposed method performed well on both datasets and ran faster than a non-hierarchical approach.
書名頁................................................................................................................................i
論文口試委員審定書.......................................................................................................ii
授權書..............................................................................................................................iii
中文摘要..........................................................................................................................iv
英文摘要...........................................................................................................................v
誌謝..................................................................................................................................vi
目錄.................................................................................................................................vii
表目錄..............................................................................................................................ix
圖目錄...............................................................................................................................x

第一章、 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 2
1.3 論文架構 2
第二章、文獻探討 3
2.1 MULTI-CLASS分類方式 3
2.2分類器介紹 7
2.3 特徵詞選取方法 11
第三章 階層分類流程 14
3.1階層分類架構 14
3.2分類器的訓練與測試流程 15
3.3分類效能指標 22
第四章、階層分類結果分析 23
4.1 實驗資料集文件剖析 23
4.2 SVM的參數設定 26
4.3特徵詞選取實驗 27
4.4階層分類實驗 32
4.4.1企業文件實驗結果 32
4.4.2大陸新聞資料集實驗結果 33
4.5實驗結果分析 35
第五章 結論與未來研究 38
[1]平震宇,『一個適用於行動裝置的網頁搜尋結果分群系統之研究』,元智大學,資訊管理研究所碩士論文,2007。

[2]曾元顯,『文件主題自動分類成效因素探討』,「中國圖書館學會會報」,2002 年,頁 62-83。

[3]Bottou, L. et al. “Comparison of classifier methods: a case study in handwriting digit recognition,” In Proceedings of ICPR-94, IEEE Computer Society Press, Los Alamitos, CA, 1994, pp. 77–87.

[4]Chang, C.-C. and C.-J. Lin (2001). LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

[5]Dietterich, T. G., and Bakiri, G. “Solving multiclass learning problems via error-correcting output codes,” Journal of Artificial Intelligence Research (2) 1995, pp:263-286.

[6]Ding, C. and Dubchak, I. “Multi-class protein fold recognition using support vector machines and neural networks, ” Bioinformatics(17) 2001, pp:349–358.

[7]Dubchak, I., Muchnik, I., Mayor, C., Dralyuk, I. and Kim, S.H. “Recognition of a protein fold in the context of the Structural Classification of Proteins (SCOP) classification, ” Proteins (35) 1999, pp:401-407.
[8]Dumais, S. and Chen, H. “Hierarchical Classification of Web Content,” Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval , 2000 , pp:256-263.

[9]Hsu, C. W. and Lin, C. J. “A comparison of methods for multi-class support vector machines,” Technical report, National Taiwan University, Taiwan, 2002.

[10]Huang, Y.L. “A Theoretic and Empirical Research of Cluster Indexing for Mandarin Chinese Full Text Document,” The Journal of Library and Information Science, Vol. 24, 1998, pp: 44-68.

[11]Kumar, S. “Modular learning through output space decomposition, ” PhD thesis, Dept. of ECE, Univ. of Texas at Austin, Dec., 2000.

[12]Kumar, S., Ghosh, J., Crawford, M.M. “Hierarchical fusion of multiple classifiers for hyperspectral data analysis,” Pattern Analysis and Applications (5) 2002, pp:210-220.

[13]Kwon, O. W, and Lee, J. H. “Web page classification based on k-nearest neighbor approach,” Proceedings of the fifth international workshop on on Information retrieval with Asian languages 2000, pp:9-15.

[14]Li, T., Zhang, C. and Zhu, S. “Empirical Studies on Multi-label Classification, ” Proceedings of the 18th IEEE International Conference on Tools with Artificial Intelligence 2006, pp:86-92.
[15]Li, T., Zhang, C. and Ogihara, M. “A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. ” Bioinformatics, 2004 , 20, 2429–2437.

[16]Y. Lee and C.K. Lee, “Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data,” Bioinformatics, 2003, vol. 19, no. 9, pp:1132-1139.

[17]Peng, W. M. and Rajapakse, J. C. “Multi-Class Protein Subcellular Localization Prediction using Support Vector Machines,” Computational Intelligence in Bioinformatics and Computational Biology 2005, pp:1-8.

[18]Platt, J., Cristianini, N., and Shawe-Taylor, J. “Large margin DAGs for multiclass classification, ” in Advances in Neural Information Processing Systems , 2000.

[19]Rennie, J., “Improving multi-class text classification with naive bayes. ” Master''s thesis, M.I.T., 2001.

[20]Rousu, J., Saunders, C., Szedmak, S. and Shawe-Taylor, J. “Learning with Taxonomies: Classifying Documents and Words,” Proceedings of the 22nd international conference on Machine learning 2005, pp:744-751.

[21]Schwenker F., “Hierarchical Support Vector Machines for Multi-Class Pattern Recognition,” Fourth International conference on Knowledge-Based Intelligent Engineering Systems & Allied Technologies , Brighton, UK, 2000.




[22]Thorsten Joachims. “Text Categorization with Support Vector Machines: Learning with Many Relevant Features,” In European Conference on Machine Learning (ECML), 1998.

[23]Thomas M. Mitchell, Machine Learning, McGraw-Hill Higher Education, 1997.

[24]V. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.

[25]Weston ,J and Watkins, C. Watkins, “Multi-class support vector machines,” Proc. ESANN99, M. Verleysen, Ed., Brussels, Belgium,1999.

[26]Weston, J. and Watkins, C. “Support Vector Machines for Multi-Class Pattern Recognition, ” Proceedings of the Seventh European Symposium On Artificial Neural Networks 1999.

[27]Yang, Y. and Liu, X. “A re-examination of text categorization methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval 1999, pp: 42-49.

[28]Zhao H, Lu BL “A modular k-nearest neighbor classification method for massively parallel text categorization,” International symposium on computational and information sciences (CIS’04), LNCS, 2004 , vol 3314, pp 867–872.

[29]Zheng, Z. and Srihari, R. “Optimally combining positive and negative features for text categorization,” In Proceedings of the ICML''03 Workshop on Learning from Imbalanced Date Sets, 2003
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top