跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2025/01/20 09:15
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:郭宗勳
研究生(外文):Kuo, Tsung-Hsun
論文名稱:基於Semi-AdaBoost.MH加上 Universum Example之文件分類法
論文名稱(外文):Document Classification based on Semi-AdaBoost.MH with Universum Example
指導教授:李嘉晃李嘉晃引用關係
指導教授(外文):Lee, Chia-Hoang
學位類別:碩士
校院名稱:國立交通大學
系所名稱:資訊科學與工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:中文
論文頁數:45
中文關鍵詞:機器學習半監督式學習
外文關鍵詞:machine learningsemi-supervised learningAdaBoost.MHUniversum
相關次數:
  • 被引用被引用:0
  • 點閱點閱:230
  • 評分評分:
  • 下載下載:7
  • 收藏至我的研究室書目清單書目收藏:0
半監督式學習法雖然已經在機器學習領域上證明了它的成功之處,但如果labeled資料極少時,還是有可能會影響到它的分類效能。Universum是一個新穎的概念,代表一群不屬於所要分類類別的資料集合,本論文提出一個半監督式學習結合Universum的方法,希望利用Universum所提供的prior knowledge,來解決傳統半監督式學習法所會遇到的問題。
  本論文從Boosting的角度著手,並進而提出以confidence來解釋Universum 為什麼可以輔助分類,而且我們提出的confidence與U-SVM的margin概念不謀而合。此外,本研究更進一步分析什麼樣的資料當Universum是會損害分類效能。在實驗部分,我們使用三種文章集進行實驗,並驗證了當labeled資料愈少時,加入Universum就愈能發揮它的影響力;且只要選用的Universum不偏向某一欲分類類別文章時,我們所提的方法不僅可以贏過原本的半監督式學習法,也可以贏過其它同樣使用Universum的半監督式學習法。

  Although Semi-Supervised learning has achieved a great success in the domain of machine learning, the classification performance may be affected when only a small amount of labeled examples is available. Universum, a collection of non-examples that do not belong to either class of interest, has become a new research topic in machine learning. This paper proposes a Semi-Supervised learning method with Universum to improve classification problems in which Universum is viewed as prior of the data set.
  This paper devises a Semi-Supervised learning with Universum algorithm based on Boosting technique, and proposes to use confidence to illustrate why Universum can improve classification performance. Moreover, the concept of confidence can correspond to the margin of U-SVM. Finally, we further analyze which data as Universum damages classification performance. In the experimental section, we use three data corpora to conduct experiments. The experimental results indicate that the fewer labeled examples we have, the more influential Universum is. If the data distribution of Universum does not bias either class of interest, the proposed method can outperform original Semi-Supervised method and several Semi-Supervised methods with Universum.

中文摘要.......................................................i
英文摘要 ......................................................ii
誌謝 ..........................................................iii
目錄 ..........................................................iv
圖目錄 ........................................................vi
表目錄 ........................................................vii
第一章、緒論 ..................................................1
1.1 研究動機...................................................1
1.2 研究目的與方法.............................................2
1.3 論文架構...................................................3
第二章、相關研究 ..............................................4
2.1 Semi-AdaBoost.MH...........................................4
2.1.1 演算法...................................................4
2.1.2 演算法分析...............................................6
2.1.3 Weak hypothesis..........................................6
2.2 Universum..................................................8
2.2.1 SVM with Universum.......................................8
2.2.2 Transductive SVM with Universum......................... 11
2.2.3 Graph-based Semi-Supervised method with Universum....... 11
2.2.4 Document Clustering with Universum ......................12
第三章、系統設計 ..............................................13
3.1 概念.......................................................13
3.2 系統架構...................................................13
3.3 系統演算法.................................................15
3.3.1 符號定義與演算法.........................................15
3.3.2 演算法分析...............................................17
3.3.3 Weak hypothesis with Universum...........................18
3.4 如何選擇Universum..........................................22
第四章、實驗結果與討論 ........................................25
4.1 實驗資料集.................................................25
4.2 實驗設計...................................................26
4.2.1 文章集前處理.............................................26
4.2.2 分類效能評估方法.........................................26
4.2.3 實驗方法與參數...........................................28
4.3 實驗結果...................................................29
4.4 實驗討論...................................................34
第五章、結論與未來展望 ........................................43
5.1 研究總結...................................................43
5.2 未來展望...................................................43
參考文獻 ......................................................44
[1] T. Joachims, “Text categorization with support vector machines: Learning with many relevant features”, ECML, Berlin: Springer, pp. 137–142, 1998.
[2] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting”, J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997.
[3] R. E. Schapire and Y. Singer, “Boostexter: A boosting-based system for text categorization”, Machine Learning, vol. 39, no. 2/3, pp. 135–168, 2000.
[4] R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions”, Machine Learning, vol. 37, pp. 297–336, December 1999.
[5] D. W. Hosmer and Stanley Lemeshow, Applied Logistic Regression, 2nd ed., Wiley, 2000.
[6] A. McCallum and K. Nigam, “A comparison of event models for na?e bayes text classification”, in IN AAAI-98 WORKSHOP ON LEARNING FOR TEXT CATEGORIZATION. AAAI Press, pp. 41–48, 1998.
[7] T. Joachims, “Transductive inference for text classification using support vector machine”, ICML, 1999.
[8] A. B. Goldberg and X. Zhu, “Seeing stars when there aren’t many stars: graph-based semi-supervised learning for sentiment categorization”, In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing. TextGraphs-1. Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 45–52, 2006.
[9] A. Blum and S. Chawla, “Learning from labeled and unlabeled data using graph mincuts”, ICML, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, pp. 19–26, 2001.
[10] C. L. Liu and S. Y. Fang, “Text Classification with Labeled and Unlabeled Data using Semi-supervised AdaBoost.MH”, submitted to Information Processing & Management.
[11] J. Weston and R. Collobert and F. H. Sinz and L. Bottou and V. Vapnik, “Inference with the universum”, ICML, pp. 1009-1016, 2006.
[12] F. H. Sinz and O. Chapelle and A. Agarwal and B. Sch?匜kopf, “An analysis of inference with the universum”, NIPS, 2007.
[13] C. Cortes and V. Vapnik, “Support-Vector Networks”, Machine Learning, 20, 1995.
[14] K. Huang and Z. Xu and I. King and M. R. Lyn, “Semi-supervised learning from general unlabeled data”, ICML, pp. 273-282, 2008
[15] J. M. Bernardo and A. F. M. Smith, Bayesian theory, John Wiley and Sons, 1994.
[16] F. Sinz and M. Roffilli, UniverSVM, software available at http://mloss.org/software/view/19/
[17] D. Zhang and J. Wang and F. Wang and C. Zhang, “Semi-supervised classification with universum”, SDM, pp. 323-333, 2008.
[18] T. Joachims, “Transductive learning via spectral graph partitioning”, ICML, pp. 290-297, 2003.
[19] B. Scholk?夗f and J. Platt and T. Hoffman, “On transductive regression”, Advances in Neural Information Processing Systems 19, 2006.
[20] D. Zhang and J. Wang and L. Si, “Document Clustering with Universum”, ACM SIGIR, 2011.
[21] L. Xu and J. Neufeld and B. Larson and D. Schuurmans, “Maximum margin clustering”, NIPS, 2004.
[22] V. Vapnik, “The nature of statistical learning theory”, New York, NY, USA: Springer-Verlag New York, Inc., 1995.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top