跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.23) 您好!臺灣時間:2025/10/26 07:47
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:董純賢
研究生(外文):Chun-Hsien Tung
論文名稱:應用多層次架構之類別優先度與多重分類器改善文件分類準確率
論文名稱(外文):Adopting the framework of Multi-level Class Priority with Multiple Classifiers to improve the Accuracy of Text Classification
指導教授:蔣定安蔣定安引用關係
指導教授(外文):Ding-An Chiang
學位類別:碩士
校院名稱:淡江大學
系所名稱:資訊工程學系碩士在職專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:72
中文關鍵詞:關聯式分類法規則排序規則相依性多層次類別優先
外文關鍵詞:Associative ClassificationRankingRule DependencyMulti-level Class Priority
相關次數:
  • 被引用被引用:0
  • 點閱點閱:252
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
一般關聯式分類法(Associative Classification, AC)通常依照準則排序,然而規則與規則間存在著規則相依性(Rule Dependency)的問題,在相同的信賴值、支援值、長度的條件下,規則的執行順序仍然會對分類結果造成影響。
本論文核心針對規則排序問題,除了採用Lazy法則為一般排序原則針對100%信賴值階層進行文件分類外,並刪除分類過文件重新計算信賴值排序,加上採用多層次類別優先度的概念,來探討其對分類效能的影響。利用TFIDF權重及貝氏分類器初次分類後所得之最低類別準確率設為單一靜態門檻值,AC無法分類之文件則以貝氏分類器來分類,以解決關聯式分類器預設類別降低分類準確率的問題。


Regardless that the associative classification (AC) [1][2] method normally ranks the sequence according to the prescribed criteria, yet in terms of the problem of rule dependency that exists between rules, under the identical confidence value, support value and length criteria, the sequence by which the rules are executed can still impact the classification results.
The core of the thesis, focusing on rule ranking problems, entails for more than adopting the Lazy[3] method as the general ranking principle for conducting document classification focusing on 100% confidence level, but also by pruning the classified documents to recalculate the confidence value ranking, together with using a multilevel class priority concept, to examine how it affects the classification performance. The TFIDF[4] weighing and the minimum classification criteria derived from the preliminary classification using the Naïve Bayes[5] classifier are used to define a single still-mode threshold value, and the Naïve Bayes classifier used to classify documents unclassifiable by the associative classification method, aiming to resolve the problem of lowering the classification precision rate due to the preset categories when using the associative classifiers.


目錄
目錄 IV
圖目錄 VI
表目錄 VII
第1章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 論文架構 6
第2章 相關文獻與研究探討 7
2.1 關聯式分類(Associative Classification) 7
2.1.1 預處理(Pre-processing) 12
2.1.2 規則產生(Rule Generation) 12
2.1.3 規則排序 (Ranking) 15
2.1.4 刪除規則(Pruning) 16
2.1.5 關聯式分類器(Association Rule Classifier) 19
2.1.6 多重分類器 20
2.2 TFIDF(Term Frequency Inverse Document Frequency) 22
2.3 貝氏分類法(Naïve Bayes) 23
2.4 評量值 25
第3章 研究方法 27
3.1 問題探討 27
3.2 門檻值設定與多重分類器 32
3.3 分類流程 34
第4章 實驗結果 36
4.1 資料來源 36
4.2 實驗結果 40
4.3 實驗結果分析 44
第5章 結論與未來展望 46
5.1 結論 46
5.2 未來展望 47
文獻參考 48
附錄一英文論文 51


圖目錄
圖 2 1 關聯式分類器分類流程示意圖 9
圖 2 2 CBA排序法 15
圖 2 3 Lazy 排序法 16
圖 2 4 database coverage演算法 17
圖 2 5 Lazy演算法 18
圖 3 1 多層次類別優先流程圖 30
圖 3 2 測試分類流程圖 35
圖 4 1 Reuters文件範例 37


表目錄
表 2 1 使用AC結合KNN分類法的多重分類器實驗結果 21
表 2 2 文件數量分佈表 25
表 3 1 貝氏分類器初次分類結果 34
表 4 3 Reuters 21578不同類別的文件數 38
表 4 4 Reuters 21578訓練及測試文件數 39
表 4 3 Lazy針對Reuters21578的分類結果 42
表 4 4 貝氏分類器針對Reuters21578的分類結果 42
表 4 5 針對Reuters21578以多重分類器及單一靜態門檻值之分類結果 43
表 4 6 Reuters 21578最佳實驗結果 44



[1]F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, 2007, pp. 37-65.
[2]Hsin Yuan Chiou, “Improving the performance of Associative Classification by using the Multi-level Class Priority of Rule Ranking,” Master thesis of Tamkang University, Jun. 2010, pp. 1-52.
[3]Mao-Sheng Hung, “Improve Document Classify Accuracy by Rule – Static threshold and Dynamic threshold Research,” Master thesis of Tamkang University, Jun. 2009, pp. 1-49.
[4]T.M. Mitchell, Machine Learning, McGraw-Hill Science/Engineering/Math, 1997.
[5]Y.M. Chen, “Using Association Rule to Improve The Accuracy of Text Categorization - The Combination with other Classifiers,” Master thesis of Tamkang University, Jun. 2009, pp. 1-57.
[6]G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, 1987.
[7]B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Knowledge Discovery and Data Mining, 1998, pp. 86, 80.
[8]U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, 1996.
[9]K. Wang, S. Zhou, and Y. He, “Growing decision trees on support-less association rules,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States: ACM, 2000, pp. 265-269.
[10]K. Wang, Y. He, and D.W. Cheung, “Mining confident rules without support requirement,” Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA: ACM, 2001, pp. 89-96.
[11]P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.
[12]W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 376, 369.
[13]Yongwook yoon, Gary G. Lee, Tseng, “Text Categorization Based on Boosting Association Rules,” Semantic Computing 2008 IEEE International Conference on, 2008, pp. 136-143.
[14]M.F. Porter, “An algorithm for suffix stripping,” Readings in information retrieval, Morgan Kaufmann Publishers Inc., 1997 , pp. 313-316.
[15]Jing Chen, Zhigang Zhang, Qing Li and Xiaoming Li, 2005, “A Pattern-Based Voting Approach for Concept Discovery on the Web,” Web Technologies Research and Development-APWeb 2005, Volume 3399/2005
[16]http://rocling.iis.sinica.edu.tw/CKIP/
[17]Karras, DA, 2006, “An Improved Text Categorization Methodology Based on Second and Third Order Probabilistic Feature Extraction and Neural Network Classifiers,” Lecture Notes in Computer Science, 2006, pp. 9-20.
[18]J.R. Quinlan and R.M. Cameron-jones, “FOIL: A Midterm Report,” IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING, vol. 667, 1993, pp. 3--20.
[19]E. Baralis, S. Chiusano, and P. Garza, “On support thresholds in associative classification,” Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus: ACM, 2004, pp. 553-558.
[20]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca, M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[21]P. Soucy and G. Mineau, “A simple KNN algorithm for text categorization,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 647-648.
[22]Y. Yang and X. Liu, “A re-examination of text categorization methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States: ACM, 1999, pp. 42-49.
[23]T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1997, pp. 143-151.
[24]P. Bickel and E. Levina, “Some theory for Fisher''s linear discriminant function, `naive Bayes'', and some alternatives when there are many more variables than observations,” Bernoulli, vol. 10, 2004, pp. 1010, 989.
[25]Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text Categorization,” Bulletin of the Library Association of China, vol. 68, Jun. 2002, pp. 62-83.
[26]Cho-Ming Lee, “Classifying Chinese Text Documents by Association Rule,” Master thesis of Tamkang University, Jun. 2006, pp. 1-66.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top