(3.237.97.64) 您好!臺灣時間:2021/03/03 01:19
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:鄒侑霖
研究生(外文):Yu-Lin Tsou
論文名稱:樹狀抽樣式標註成本導向主動學習演算法
論文名稱(外文):Annotation Cost-sensitive Active Learning by Tree Sampling
指導教授:林軒田
指導教授(外文):Hsuan-Tien Lin
口試委員:王鈺強陳縕儂
口試委員(外文):Yu-Chiang WangYun-Nung Chen
口試日期:2017-06-05
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:39
中文關鍵詞:機器學習主動學習標註成本導向
外文關鍵詞:Machine LearningActive LearningAnnotation Cost-sensitive
相關次數:
  • 被引用被引用:0
  • 點閱點閱:609
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Active learning is an important machine learning setup for reducing the labelling effort of humans. Although most existing works are based on a simple assumption that each labelling query has the same annotation cost, the assumption may not be realistic. That is, the annotation costs may actually vary between data instances. In addition, the costs may be unknown before making the query. Traditional active learning algorithms cannot deal with such a realistic scenario. In this work, we study annotation-cost-sensitive active learning algorithms, which need to estimate the utility and cost of each query simultaneously. We propose a novel algorithm, the cost-sensitive tree sampling(CSTS) algorithm, that conducts the two estimation tasks together and solve it with a tree-structured model motivated from hierarchical sampling, a famous algorithm for traditional active learning. By combining multiple tree-structured models, an extension of CSTS, the cost-sensitive forest sampling(CSFS) algorithm, is also proposed and discussed. Extensive experimental results using data sets with simulated and true annotation costs validate that the proposed methods are generally superior to other annotation cost-sensitive algorithms.
口試委員會審定書i
誌謝ii
摘要iii
Abstract iv
1 Introduction 1
2 Related work 4
3 The Proposed Approach 6
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 Revised decision tree algorithm . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.1 Decision tree algorithm . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.2 Metric for node evaluation . . . . . . . . . . . . . . . . . . . . . 11
3.2.3 Tree construction . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Cost-sensitive tree sampling . . . . . . . . . . . . . . . . . . . . . . . . 16
3.4 Cost-sensitive forest sampling . . . . . . . . . . . . . . . . . . . . . . . 19
4 Experiments 22
4.1 CSAL with artificial costs . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1.1 Reverse distance cost . . . . . . . . . . . . . . . . . . . . . . . . 25
4.1.2 Distance cost . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 CSAL with real annotation costs . . . . . . . . . . . . . . . . . . . . . . 30
4.3 Parameter Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . . 32
5 Conclusion 35
Bibliography 36
[1] S. Arora, E. Nyberg, and C. P. Rose. Estimating annotation cost for active learning in a multi-annotator environment. In Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pages 18–26. Association for Computational Linguistics, 2009.
[2] L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. CRC press, 1984.
[3] C.-C. Chang and C.-J. Lin. Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27, 2011.
[4] O. Chapelle, J. Weston, and B. Scholkopf. Cluster kernels for semi-supervised learning. Advances in neural information processing systems, pages 601–608, 2003.
[5] D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine learning, 15(2):201–221, 1994.
[6] S. Dasgupta. Two faces of active learning. Theoretical computer science, 412(19):1767–1781, 2011.
[7] S. Dasgupta and D. Hsu. Hierarchical sampling for active learning. In Proceedings of the 25th international conference on Machine learning, pages 208–215. ACM, 2008.
[8] R. Haertel, K. D. Seppi, E. K. Ringger, and J. L. Carroll. Return on investment for active learning. In Proceedings of the NIPS Workshop on Cost-Sensitive Learning, 2008.
[9] J. A. Hartigan and M. A. Wong. Algorithm as 136: A k-means clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 28(1):100–108, 1979.
[10] A. Holub, P. Perona, and M. C. Burl. Entropy-based active learning for object recognition. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 1–8. IEEE, 2008.
[11] K.-H. Huang and H.-T. Lin. A novel uncertainty sampling algorithm for costsensitive multiclass active learning. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2016.
[12] S.-J. Huang, R. Jin, and Z.-H. Zhou. Active learning by querying informative and representative examples. In Advances in neural information processing systems, pages 892–900, 2010.
[13] J. Kang, K. R. Ryu, and H.-C. Kwon. Using cluster-based sampling to select initial training set for active learning in text classification. In Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 384–388. Springer, 2004.
[14] R. D. King, K. E. Whelan, F. M. Jones, P. G. Reiser, C. H. Bryant, S. H. Muggleton, D. B. Kell, and S. G. Oliver. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427(6971):247–252, 2004.
[15] D. D. Lewis and W. A. Gale. A sequential algorithm for training text classifiers. In Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, pages 3–12. Springer-Verlag New York, Inc., 1994.
[16] M. Lichman. UCI machine learning repository, 2013. http://archive.ics.uci.edu/ml.
[17] Y. Liu. Active learning with support vector machine applied to gene expression data for cancer classification. Journal of chemical information and computer sciences, 44(6):1936–1941, 2004.
[18] D. D. Margineantu. Active cost-sensitive learning. In Proceedings of International Joint Conference on Artificial Intelligence, pages 1622–1623, 2005.
[19] H. T. Nguyen and A. Smeulders. Active learning using pre-clustering. In Proceedings of the 21th international conference on Machine learning, page 79. ACM, 2004.
[20] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011.
[21] J. R. Quinlan. Induction of decision trees. Machine learning, 1(1):81–106, 1986.
[22] J. R. Quinlan. C4. 5: programs for machine learning. Elsevier, 2014.
[23] E. Ringger, P. McClanahan, R. Haertel, G. Busby, M. Carmen, J. Carroll, K. Seppi, and D. Lonsdale. Active learning for part-of-speech tagging: Accelerating corpus annotation. In Proceedings of the Linguistic Annotation Workshop, pages 101–108. Association for Computational Linguistics, 2007.
[24] M. Seeger. Learning with labeled and unlabeled data. Technical report, 2000.
[25] B. Settles. Active learning literature survey. University of Wisconsin, Madison, 52(55-66):11, 2010.
[26] B. Settles, M. Craven, and L. Friedland. Active learning with real annotation costs. In Proceedings of the NIPS workshop on cost-sensitive learning, pages 1–10, 2008.
[27] K. Tomanek and U. Hahn. A comparison of models for cost-sensitive active learning. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pages 1247–1255. Association for Computational Linguistics, 2010.
[28] S. Tong and D. Koller. Support vector machine active learning with applications to text classification. Journal of machine learning research, 2(Nov):45–66, 2001.
[29] V. N. Vapnik and V. Vapnik. Statistical learning theory. Wiley New York, 1998.
[30] Z. Xu, K. Yu, V. Tresp, X. Xu, and J. Wang. Representative sampling for text classification using support vector machines. In European Conference on Information Retrieval, pages 393–407. Springer, 2003.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔