跳到主要內容

臺灣博碩士論文加值系統

(100.24.118.144) 您好!臺灣時間:2022/12/06 05:32
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張日威
研究生(外文):Jih-Wei Chang
論文名稱:應用LDA進行Plurk主題分類及使用者情緒分析
論文名稱(外文):Apply LDA with Topic Categories on Plurk and User Sentiment Analysis
指導教授:黃純敏黃純敏引用關係
指導教授(外文):Chuen-min Huang
口試委員:黃純敏郭淑美黃錦法
口試委員(外文):Chuen-min HuangShu-mei GuoJin-fa Huang
口試日期:2014-07-30
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2014
畢業學年度:102
語文別:中文
論文頁數:65
中文關鍵詞:LDA情緒分析微網誌分類Plurk
外文關鍵詞:LDASentiment AnalysisMicroblog ClassificationPlurk
相關次數:
  • 被引用被引用:19
  • 點閱點閱:2579
  • 評分評分:
  • 下載下載:371
  • 收藏至我的研究室書目清單書目收藏:1
在網路快速的擴展下,人們習慣在網路上傳遞訊息,然而這些訊息對於很多人來說非常重要,凡舉廣告商、企業、消費者等,但人們要如何在短時間內消化這些大量的訊息將是個很重要的議題。之前許多人利用文件摘要來擷取出文章中的大綱,然而近年來社交網路的崛起,摘要在某些情況下已不太適合處理社交網路的文章,尤其是以微網誌為主的社交網站。微網誌其文章內容常限制在140字以內,因此許多研究開始針對在有限的字數內找出其隱含主題以及使用者對於該主題的情緒。本研究以台灣常見的微網誌噗浪(Plurk)作為資料集,透過Latent Dirichlet Allocation (LDA)找出文章中的相關主題,並透過情感分析來對相關主題給予極性分類,讓使用者可以快速了解大眾對於相關主題的喜好程度。實驗結果顯示,在一般的情況下多數的Plurk資料為垃圾資訊,若資料集包含明顯討論的議題,透過LDA才能找出有意義的主題,否則找出來的主題大多過於空泛,另外LDA可以有效地對小文本進行分類,且情緒分類的準確度約達60%。
The Internet grew rapidly more than ever, people are used to transferring information by the Internet, however, some people like advertisers, enterprises or customers may be interested in these information, but it is the issue that how to digest a massive of information in the short time. Prior research is focus on the document summary, and in recent years the expand of the social networks, the document summary is no longer fit for it, especially in the microblog, typically each article is restricted in 140 words or less in the microblog, therefore people began to research on how to find the latent topic and user’s sentiment. In this research we use the most common microblog in Taiwan named Plurk as the dataset, via Latent Dirichlet Allocation (LDA) model to find out the related topics in each article, then combing sentiment analysis to classification the related topic in given polarity, so that the user can understand public’s preferences for the related topics quickly. After the experiment, in general case, most Plurk data are useless, but LDA can find meaningful topic if dataset contain clear issues, otherwise, the finding topic will be too vague, in addition LDA can categories the small text effectively, and the precision of sentiment classification is 60%.
中文摘要 i
Abstract ii
目錄 iii
表目錄 vi
圖目錄 viii
第一章 緒論 1
1.1 研究背景與重要性 1
1.2研究目的 3
第二章 文獻探討 4
2.1分類的模型 4
2.1.1 主題模型 4
2.1.2 支持向量機(Support Vector Machine, SVM) 4
2.1.3 隱含狄利克雷分布(Latent Dirichlet Allocation, LDA) 5
2.2 情緒分析(Sentiment Analysis) 8
2.2.1情緒分析字典建置 9
2.2.2情緒之偵測 9
第三章 研究方法 12
3.1 研究架構與方法 12
3.2 資料下載 13
3.3 資料前處理 14
3.3 1 CKIP斷詞與詞性標註 14
3.3.2 剔除停用詞 14
3.3.3 剔除垃圾訊息 15
3.4 LDA演算法 16
3.4.1 LDA參數選擇 16
3.4.2 LDA輸出資料 16
3.4.3 LDA主題字詞之建構 17
3.5 情緒持有者之辨識 17
3.6 情緒字詞偵測 17
3.6.1 情緒字典 18
3.6.2 表情符號偵測 22
3.7 情緒分類 24
3.7.1 情緒分類計算 24
3.8 關鍵文件偵測 25
3.8.1 關鍵情緒字詞 25
3.8.2 意見領導者 26
第四章 實驗結果 27
4.1資料前處理 27
4.2 訂定LDA主題與分析 29
4.2.1 期間資料 29
4.2.2 特定資料 31
4.3 情緒字詞判斷 35
4.3.1 疑問句情緒轉換 35
4.3.2 情緒極性轉換 36
4.4 情緒分類結果與分析 36
4.5關鍵情緒字詞與關鍵文件結果分析 37
4.5.1 關鍵情緒字詞 37
4.5.2 關鍵文件與主題極性 38
第五章 結論 41
參考文獻 42
附錄A 45
附件B 47

[1]創市際ARO. (2013, 12). 台灣社群媒體網站使用概況. Available: http://zh.scribd.com/doc/157178884/InsightXplorer-Monthly-Report-201307
[2]J. A. Horrigan, "Online shopping," Pew Internet & American Life Project Report,2008.
[3]comScore. (2007). Online consumer-generated reviews have significant impact on offline purchase behavior, Press Release. Available: http://www.comscore.com/Insights/Press-Releases/2007/11/Online-Consumer-Reviews-Impact-Offline-Purchasing-Behavior
[4]G. Salton, A. Wong, and C. S. Yang, "A vector space model for automatic indexing," Commun. ACM, vol. 18, pp. 613-620, 1975.
[5]E. Cambria, B. Schuller, X. Yunqing, and C. Havasi, "New Avenues in Opinion Mining and Sentiment Analysis," Intelligent Systems, IEEE, vol. 28, pp. 15-21, 2013.
[6]C. H. Papadimitriou, H. Tamaki, P. Raghavan, and S. Vempala, "Latent semantic indexing: a probabilistic analysis," presented at the Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems, Seattle, Washington, USA, 1998.
[7]G. Salton and M. J. McGill, Introduction to Modern Information Retrieval: McGraw-Hill, Inc., 1986.
[8]C. Cortes and V. Vapnik, "Support-Vector Networks," Mach. Learn., vol. 20, pp. 273-297, 1995.
[9]V. Vapnik and A. Lerner, "Pattern recognition using generalized portrait method " Automation and Remote Control, vol. 24, pp. 774-780, 1963.
[10]B. E. Boser, I. M. Guyon, and V. N. Vapnik, "A training algorithm for optimal margin classifiers," presented at the Proceedings of the fifth annual workshop on Computational learning theory, Pittsburgh, Pennsylvania, USA, 1992.
[11]B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up?: sentiment classification using machine learning techniques," presented at the Proceedings of the ACL-02 conference on Empirical methods in natural language processing - Volume 10, 2002.
[12]K. Dave, S. Lawrence, and D. M. Pennock, "Mining the peanut gallery: opinion extraction and semantic classification of product reviews," presented at the Proceedings of the 12th international conference on World Wide Web, Budapest, Hungary, 2003.
[13]B. Pang and L. Lee, "Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales," presented at the Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, Michigan, 2005.
[14]A. Kennedy and D. Inkpen, "Sentiment classification of movie reviews using contextual valence shifters," Computational Intelligence, vol. 22, pp. 110-125, 2006.
[15]S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman, "Indexing by latent semantic analysis," Journal of the American Society for Information Science, vol. 41, pp. 391-407, 1990.
[16]T. Hofmann, "Probabilistic latent semantic indexing," presented at the Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA, 1999.
[17]A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm.," J. Royal Statistical Society, Series B, vol. 39, pp. 1-38, 1977.
[18]D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent dirichlet allocation," J. Mach. Learn. Res., vol. 3, pp. 993-1022, 2003.
[19]A. Celikyilmaz, D. Hakkani-Tur, and F. Junlan, "Probabilistic model-based sentiment analysis of twitter messages," presented at the Spoken Language Technology Workshop (SLT), 2010.
[20]N. N. T. G. J. Kunegis and A. C. Alhadi, "Bad news travel fast: A content-based analysis of interestingness on twitter," 2011.
[21]冯时, 景珊, 杨卓, and 王大玲, "基于 LDA 模型的中文微博话题意见领袖挖掘," 东北大学学报: 自然科学版, vol. 34, pp. 490-494, 2013.
[22]孙艳, 周学广, and 付伟, "基于主题情感混合模型的无监督文本情感分析," 北京大学学报: 自然科学版, pp. 102-108, 2013.
[23]張育蓉, "使用情緒分析於圖書館使用者滿意度評估之研究," 碩士, 圖書資訊學研究所, 國立中興大學, 2012.
[24]D. Newman, K. Hagedorn, C. Chemudugunta, and P. Smyth, "Subject metadata enrichment using statistical topic models," presented at the Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries, Vancouver, BC, Canada, 2007.
[25]J. Boyd-Graber, J. Chang, S. Gerrish, C. Wang, and D. Blei, "Reading tea leaves: how humans interpret topic models," presented at the Neural Information Processing Systems NIPS, 2009.
[26]V. Hatzivassiloglou and K. R. McKeown, "Predicting the semantic orientation of adjectives," presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain, 1997.
[27]P. Turney and M. Littman, "Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus," 2002.
[28]A. Huettner and P. Subasic, "Fuzzy typing for document management."
[29]B. Pang and L. Lee, "Opinion Mining and Sentiment Analysis," Found. Trends Inf. Retr., vol. 2, pp. 1-135, 2008.
[30]B. Pang and L. Lee, "A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts," presented at the Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, Barcelona, Spain, 2004.
[31]S.-M. Kim and E. Hovy, "Determining the sentiment of opinions," presented at the Proceedings of the 20th international conference on Computational Linguistics, Geneva, Switzerland, 2004.
[32]李佳穎, "意見持有者辨識及其意見立場分析," 碩士, 資訊工程所, 國立臺灣大學, 2009.
[33]C. Yang, K. H.-Y. Lin, and H.-H. Chen, "Emotion Classification Using Web Blog Corpora," presented at the Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, 2007.
[34]R. E. Thayer, The biopsychology of mood and arousal: New York, NY, US: Oxford University Press., 1989.
[35]孫瑛澤, 陳建良, 劉峻傑, 劉昭麟, and 蘇豐文, "中文短句之情緒分類," presented at the Proceedings of the 22nd Conference on Computational Linguistics and Speech Processing(ROCLING 2010), 2010.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top