跳到主要內容

臺灣博碩士論文加值系統

(3.235.60.144) 您好!臺灣時間:2021/07/27 01:38
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:洪茂盛
研究生(外文):Mao-Sheng Hung
論文名稱:利用關聯式法則改善文件分類準確度-靜態與動態門檻值問題之探討
論文名稱(外文):Improve Document Classify Accuracy by Association Rule- Static threshold and Dynamic threshold Research
指導教授:蔣定安蔣定安引用關係
指導教授(外文):Ding-An Chiang
學位類別:碩士
校院名稱:淡江大學
系所名稱:資訊工程學系碩士在職專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:49
中文關鍵詞:文件分類關聯式法則靜態門檻值動態門檻值
外文關鍵詞:document classificationassociation rulestatic thresholddynamic threshold
相關次數:
  • 被引用被引用:0
  • 點閱點閱:183
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在利用關聯式法則(Association Rule)做分類時,一般關聯式法則分類(Association Rule Classification)的信賴度門檻值設定,大多是依據經驗法則來設定單一且固定的信賴值(Confidence)為其門檻值(Threshold value),所以在設定上較為主觀。同時為了提升分類的準確度,依經驗通常選取較高的信賴度當門檻值,但門檻值若設定太高時,則容易使得部分文件因缺乏規則無法判斷其歸屬的類別,而必須利用預設規則(default rule)將這些文件分類成預設類別;若將門檻值降低則可能造成文件分類錯誤而降低分類的效能。因此本論文將針對門檻值問題做相關探討。
本論文針對信賴度門檻值問題,分兩部分來討論,一為採取靜態門檻值或動態門檻值;另外是採單一門檻值或多重門檻值。動態門檻值的概念是在每次分類後比較準確率是否有提升,決定是否向上修定原始的門檻值;而多重門檻值的概念則是可根據不同的類別,設定不同的門檻值。
實驗將依據不同的組合來設定門檻值,同時,希望能依據實驗的結果,找出如何能以客觀的方式來設定信賴度門檻值,並提升分類的效能。
While using association rule for classification, the experience for association classification rules setting is following single and fixed confidence threshold value, hence is comparatively subjective. In order to increase the accuracy of classification, usually choose higher confidence in accordance with experience, but if set the confidence too high, might cause a part of documentations failed to justify the attributes by lacking rules; if set the confidence too low, it may decrease the documentation classification efficiency.
This thesis focus on the threshold value discussion, which divides into two parts, one is static threshold value, though the training process is quicker and simpler, but during the classification procedure, the accuracy that originally already been improved could probably be influenced by follow-up lower confidence rule, namely this kind of confidence rule accuracy is low than the original threshold value setting, therefore may decrease the documentation classification efficiency, so that this thesis proposes the dynamic threshold value, to determine whether the threshold value is upward revision by after each classified whether comparative improved the accuracy or not, also propose in an objective way to set the confidence threshold value to improve the classification efficiency, this thesis proved by experiment the dynamic threshold value can obtain better classification efficiency than static threshold value.
第一章 緒論 1
1.1 前言 1
1.2 研究動機與目的 2
1.3 論文架構 4
第二章 相關文獻與研究探討 5
2.1 關聯式分類 (Associative Classification) 5
2.1.1 規則產生 (Rule Generation) 9
2.1.2 規則排序 (Ranking) 11
2.1.3 刪除規則 (Pruning) 12
2.1.4 關聯式分類器 (Association Rule Classifier) 15
2.1.5 多重分類器 16
2.2 TFIDF (Term Frequency Inverse Document Frequency) 18
2.3 貝氏分類器(Naïve Bayes Classifier) 19
2.4 評量值 21
第三章 研究方法 23
3.1 門檻值設定 23
3.2 靜態門檻值 24
3.3 動態門檻值 26
3.4 實驗步驟 27
3.5 執行策略 28
第四章 實驗結果 30
4.1資料來源 30
4.2 實驗結果 32
4.2.1 Precision-based分類結果 32
4.2.2 F1-based分類結果 34
4.3 實驗結果分析 36
第五章 結論與未來展望 37
5.1 結論 37
5.2 未來展望 38
文獻參考 39
英文論文 41


圖目錄
圖2-1 關聯式分類流程圖 6
圖2-2 CBA 排序法 11
圖2-3 Lazy 排序法 12
圖2-4 database coverage演算法 13
圖2-5 Lazy 演算法 14
圖3-1 靜態門檻值流程圖 24
圖3-2 靜態門檻值流程圖 26

表目錄
表2-1 關聯式規則搜索與關聯式分類器差異表 6
表2-2 利用AC單一分類器的實驗結果 16
表2-3 使用AC結合KNN分類法的多重分類器實驗結果 17
表2-4 文件數量分佈表 21
表4-1 由各系所選取出的文章數 30
表4-2 文件描述的格式 31
表 4-3 依準確率設定單一門檻值 32
表4-4 依準確率設定多重門檻值 32
表4-5 利用準確率為靜態及動態門檻值之分類準確率比較 33
表4-6 利用準確率為靜態及動態門檻值之分類文件正確數比較 34
表 4-7 依F1設定單一門檻值 34
表 4-8 依F1設定多重門檻值 34
表4-9 利用F1為靜態及動態門檻值之分類準確率比較 35
表4-10 利用F1為靜態及動態門檻值之分類文件正確數比較 35
表4-11 最佳實驗結果 36
[1]U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, eds., Advances in knowledge discovery and data mining, American Association for Artificial Intelligence, 1996.
[2]K. Wang, S. Zhou, and Y. He, “Growing decision trees on support-less association rules,” Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, Massachusetts, United States: ACM, 2000, pp. 265-269.
[3]K. Wang, Y. He, and D.W. Cheung, “Mining confident rules without support requirement,” Proceedings of the tenth international conference on Information and knowledge management, Atlanta, Georgia, USA: ACM, 2001, pp. 89-96.
[4]Y.M. Chen, “Using Association Rule to Improve The Accuracy of Text Categorization - The Combination with other Classifiers,” Master thesis of Tamkang University, Jun. 2009, pp. 1-57.
[5]F. THABTAH, “A review of associative classification mining,” Knowl. Eng. Rev., vol. 22, 2007, pp. 37-65.
[6]J.R. Quinlan and R.M. Cameron-jones, “FOIL: A Midterm Report,” IN PROCEEDINGS OF THE EUROPEAN CONFERENCE ON MACHINE LEARNING, vol. 667, 1993, pp. 3--20.
[7]B. Liu, W. Hsu, and Y. Ma, “Integrating Classification and Association Rule Mining,” Knowledge Discovery and Data Mining, 1998, pp. 86, 80.
[8]P.G. Elena Baralis, “A Lazy Approach to Pruning Classification Rules,” Dec. 2002.
[9]E. Baralis, S. Chiusano, and P. Garza, “On support thresholds in associative classification,” Proceedings of the 2004 ACM symposium on Applied computing, Nicosia, Cyprus: ACM, 2004, pp. 553-558.
[10]W. Li, J. Han, and J. Pei, “CMAR: accurate and efficient classification based on multiple class-association rules,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 376, 369.
[11]R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules,” Proc. 20th Int. Conf. Very Large Data Bases, VLDB, J.B. Bocca, M. Jarke, and C. Zaniolo, eds., Morgan Kaufmann, 1994, pp. 487–499.
[12]P. Soucy and G. Mineau, “A simple KNN algorithm for text categorization,” Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, 2001, pp. 647-648.
[13]T.M. Mitchell, Machine Learning, McGraw-Hill Science/Engineering/Math, 1997.
[14]G. Salton and C. Buckley, Term Weighting Approaches in Automatic Text Retrieval, Cornell University, 1987.
[15]Y. Yang and X. Liu, “A re-examination of text categorization methods,” Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, United States: ACM, 1999, pp. 42-49.
[16]T. Joachims, “A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization,” Proceedings of the Fourteenth International Conference on Machine Learning, Morgan Kaufmann Publishers Inc., 1997, pp. 143-151.
[17]P. Bickel and E. Levina, “Some theory for Fisher''s linear discriminant function, `naive Bayes'', and some alternatives when there are many more variables than observations,” Bernoulli, vol. 10, 2004, pp. 1010, 989.
[18]Tseng, Yuen-Hsien, “Effectiveness Issues in Automatic Text Categorization,” Bulletin of the Library Association of China, vol. 68, Jun. 2002, pp. 62-83.
[19]中央研究院, “中文斷詞系統, http://ckipsvr.iis.sinica.edu.tw/.”
[20]國家圖書館, “全國博碩士論文資訊網,
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top