跳到主要內容

臺灣博碩士論文加值系統

(216.73.217.103) 您好!臺灣時間:2026/06/01 02:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:林瑋詩
研究生(外文):Wei-Shih Lin
論文名稱:基於轉移學習運用有雜訊的資訊處理分類問題
論文名稱(外文):A Transfer-Learning Approach to Exploit Noisy Information for Classification
指導教授:林守德林守德引用關係
指導教授(外文):Shou-De Lin
口試委員:蔡銘峰鮑興國張智星駱宏毅
口試委員(外文):Ming-Feng TsaiHsing-Kuo PaoJyh-Shing JangHung-yi Lo
口試日期:2013-07-17
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:36
中文關鍵詞:轉移學習情感預測
外文關鍵詞:Transfer LearningFeature TransferSentiment PredictionNovel Topics
相關次數:
  • 被引用被引用:0
  • 點閱點閱:227
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
一直以來,豐富且大量的正確標記資料是一件費時又費力的工程;雖然我們可以利用自動化的方法標記海量的資料,但正確性卻也令人堪憂。因此,我們提出了一個同時使用正確但少量及大量但有雜訊資料的演算法。利用被標記過兩次以上的資料當作橋梁,藉以求出每筆資料的權重以及特徵值的轉換公式。之後,我們將演算法實驗於三個人造資料及一個現實生活實用的問題:新議題下的情感傳遞預測。有別於傳統的情感預測問題,新議題下因為缺乏歷史文字資料,因此更為艱難。最後,經實驗證明,本研究提出之演算法在不同四種資料庫下表現皆優於其他各種方法。

Generally qualitative condition (the accuracy of the data) and quantitative condition (the amount of data) of the data can significantly affect the quality of a supervised learning model. However, in real-world applications it might not be feasible to always assume one can obtain large amount of high-quality datasets. This research assumes the situation that there is a only small amount of accurate training data available for learning, aiming at designing a transfer-learning based approach to utilize larger amount of noisy (in terms of labels and features) training data to improve the learning quality. This problem is non-trivial because the distribution in noisy training dataset is different from that of the testing data. In this thesis, we proposed a novel transfer learning algorithm, Noise-Label Transfer Learning (NLTL), to solve the problem. We exploit the information of labels and features from accurate and noise data, transferring the features into same domain and adjusting the weights of instances for learning. The experiment result shows NLTL could outperform the existing approaches.

口試委員會審定書 i
Acknowledgements ii
中文摘要 iii
ABSTRACT iv
CONTENTS v
LIST OF FIGURES vii
LIST OF TABLES viii
Chapter 1 Introduction 1
1.1 Background 1
1.2 Contributions 3
1.3 Thesis Organization 4
Chapter 2 Related Work 5
2.1 Transfer Learning 5
2.2 Multi-Label Algorithm 6
2.3 Semi-Supervised Learning 7
Chapter 3 Methodology 8
3.1 Problem Definition 8
3.2 Noise-Label Transfer Learning (NLTL) 10
3.2.1 Overall Algorithm 11
3.2.2 Solving the Qualitative Issue: Feature Transfer 12
3.2.3 Solving the Quantitative Issue: Instance Weight Tuning 13
Chapter 4 Experiments 16
4.1 UCI Machine Learning Repository 16
4.2 Experiment Setting 17
4.3 Baseline 18
4.4 Evaluation Metric 18
4.5 Experiment Result 19
Chapter 5 Case Study: Sentiment Diffusion Prediction on Novel Topics 20
5.1 Literatures of Sentiment Diffusion Prediction on Novel Topics 22
5.1.1 Traditional Sentiment Prediction 22
5.1.2 Prediction on Novel Topics 23
5.2 Labeling Methods 25
5.2.1 Emoticon Labeling 25
5.2.2 Manual Labeling 26
5.2.3 Sentiment Dictionary Labeling 26
5.3 Feature Generation 26
5.3.1 Link Sentiment Information 27
5.3.2 User Sentiment Information 27
5.3.3 Topic Information 28
5.3.4 Global Information 29
5.4 Real-World Dataset: Plurk 29
5.5 Evaluation of Sentiment Diffusion Prediction on Novel Topics 30
5.5.1 Single Feature Comparison 30
5.5.2 NLTL Comparison 31
Chapter 6 Conclusion 32
Chapter 7 Reference 33


[1]Ethem Alpaydm, Introduction to Machine Learning. The MIT Press, 2004.
[2]Sinno Jialin Pan and Qiang Yang, “A Survey on Transfer Learning,” IEEE Transactions on Knowledge and Data Engineering, Vol. 22, pp. 1345-1359, October 2010.
[3]Wenyuan Dai, Qiang Yang, Gui-Rong Xue and Yong Yu, “Boosting for Transfer Learning”, in Proceedings of the 24th International Conference on Machine Learning, 2007, pp. 193-200.
[4]John Blitzer, Ryan McDonald and Fernando Pereira, “Domain Adaptation with Structural Correspondence Learning”, in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), 2006, pp. 120-128.
[5]Grigorios Tsoumakas and Ioannis Katakis, “Multi-Label Classification: An Overview”, International Journal of Data Warehousing and Mining, Vol. 3, Issue 3, pp. 1-13, 2007.
[6]Gjorgji Madjarov, Dragi Kocev, Dejan Gjorgjevikj and Sašo Džeroski, “An Extensive Experimental Comparison of Methods for Multi-Label Learning”, Pattern Recognition, Vol. 45, Issue 9, pp. 3084-3104, September 2012.
[7]Min-Ling Zhang and Zhi-Hua Zhou, “ML-KNN: A Lazy Learning Approach to Multi-Label Learning”, Pattern Recognition, Vol. 40, Issue 7, pp. 2038-2048, 2007.
[8]Jesse Read, Bernhard Pfahringer, Geoff Holmes and Eibe Frank, “Classifier Chains for Multi-Label Classification”, in Machine Learning and Knowledge Discovery in Databases, pp.254-269, 2009.
[9]Weiwei Cheng and Eyke Hullermeier, “Combining Instance-Based Learning and Logistic Regression for Multilabel Classification”, Machine Learning, Vol. 76, Issue 2-3, pp. 211-225, September 2009.
[10]Oliver Chapelle, Bernhard Scholkopf and Alexander Zien, Semi-Supervised Learning. Cambridge: MIT press, 2006.
[11]Bache, K. and Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[12]Paulo Cortez, Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis. “Modeling Wine Preferences by Data Mining from Physicochemical Properties”. In Decision Support Systems, Elsevier, Vol. 47, Issue 4, pp. 547-553, 2009.
[13]Xinfan Meng, Furu Wei, Xiaohua Liu, Ming Zhou, Sujian Li and Houfeng Wang, “Entity-Centric Topic-Oriented Opinion Summarization in Twitter”, in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 379-387, August 2012.
[14]Ying-Tse Sun, Chien-Liang Chen, Chun-Chieh Liu, Chao-Lin Liu and Von-Wun Soo, “Sentiment Classification of Short Chinese Sentences”, In Proceedings of Conference on Computational Linguistics and Speech Processing (ROCLING''10), pp. 184-198, 2010.
[15]Luciano Barbosa and Junlan Feng, “Robust Sentiment Detection on Twitter from Biased and Noisy Data”, in Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 36-44, 2010.
[16]Adam Bermingham and Alan Smeaton, “Classifying Sentiment in Microblogs: Is Brevity an Advantage?”, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1833-1836, 2010.
[17]Albert Bifet and Eibe Frank, “Sentiment Knowledge Discovery in Twitter Streaming Data”, in Proceedings of the 13th International Conference on Discovery Science, pp. 1-15, 2010.
[18]Pedro Henrique Calais Guerra, Adriano Veloso, Wagner Meira Jr. and Virgilio Almeida, “From Bias to Opinion: A Transfer-Learning Approach to Real-Time Sentiment Analysis”, in Proceedings of the 17th ACM SIGKDD International conference on Knowledge Discovery and Data Mining, pp. 150-158, 2011.
[19]Long Jiang, Mo Yu, Ming Zhou, Xiaohua Liu and Tiejun Zhao, “Target-dependent Twitter Sentiment Classification”, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Vol. 1, pp. 151-160, 2011.
[20]Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Nanyun Peng, Shou-De Lin and Wei-Fen Lin, “Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Network”, in Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pp. 344-348, 2012.
[21]David M. Blei, Andrew Y. Ng and Michael I. Jordan, “Latent Dirichlet Allocation”, Journal of Machine Learning Research, Vol. 3, pp. 993-1022, 2003.
[22]Tsung-Ting Kuo, San-Chuan Hung, Wei-Shih Lin, Shou-De Lin, Ting-Chun Peng and Chia-Chuh Shih, “Assessing the Quality of Diffusion Models Using Real-World Social Network Data”, in Technologies and Applications of Artificial Intelligence (TAAI) 2011, pp. 200-205, 2011.
[23]Likun Qiu, WeiShi Zhang, Changjian Hu and Kai Zhao, “SELC: A Self-Supervised Model for Sentiment Classification”, in Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 929-936, 2009.
[24]Taras Zagibalov and John Carroll, “Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Text”, in Proceedings of the 22nd International Conference on Computational Linguistics, Vol. 1, pp. 1073-1080, 2008.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top