跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.176) 您好!臺灣時間:2025/09/08 07:40
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:張嘉真
研究生(外文):Chia-Chen Chang
論文名稱:自動建置領域中文情緒詞典之研究
論文名稱(外文):The Research of Constructing Domain-Specific Chinese Sentiment Lexicon
指導教授:黃三益黃三益引用關係
指導教授(外文):San-Yih Hwang
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:72
中文關鍵詞:中文情緒詞典情緒分析標籤傳播法詞向量文字探勘
外文關鍵詞:text miningsentiment analysisChinese sentiment lexiconword embeddinglabel propagation
相關次數:
  • 被引用被引用:1
  • 點閱點閱:334
  • 評分評分:
  • 下載下載:38
  • 收藏至我的研究室書目清單書目收藏:0
隨著社群媒體的盛行,使用者產生大量的文字資料,如:推文、部落格和評論等,這些文字資料都富含著潛在的情緒,我們可以透過情緒分析來得到人們的感受及意見取向。而近年來,情緒分析常以情緒詞典來當作分析的工具,由於領域的多樣性及領域的先驗知識,使得特定領域的情緒詞典在情緒分析中扮演著相當重要的角色。
目前中文的情緒詞典資源還是不足且多為不分領域的,因此我們透過建立特定領域的情緒詞典來輔助情緒分析,使結果更為準確。在本研究中,我們分析Booking.com中1,294,141則旅館評論,利用向量空間模型來得到字詞之間的語義關係,並預測字詞的情緒分數,將兩者結合後再透過標籤傳播法以自動建置旅館領域的中文情緒詞典。我們所提出的方法可以達到83%的準確度。
With the booming of social media, users generate a large number of texts, such as tweets, blogs, and comments, which are full of potential sentiment. Sentiment analysis aims to obtain people’s feelings and opinions from textual data. The most popular approach for sentiment analysis is to consult the sentiment lexicon. However, due to the diversity of the domain and the prior knowledge, the domain-specific sentiment lexicon plays an important role in sentiment analysis.
Chinese sentiment lexicon resources, when compared to their English counterparts, are still limited and mostly for general-purpose. Therefore, this research proposes techniques to construct a domain-specific sentiment lexicon in order to obtain a more accurate sentiment analysis. In this thesis, we analyze 1,294,141 hotel reviews crawled from Booking.com, utilizing the vector space model to obtain the semantic meanings between words, and predicting the sentiment scores of the words. Finally, we combine the context and sentiment information with label propagation method to construct a domain-specific sentiment lexicon automatically in hotel domain. The method we proposed achieves 83% precision.
論文審定書 i
摘要 ii
Abstract iii
Table of Contents iv
List of Figures vi
List of Tables viii
Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Research Problem 6
1.3 Research Motivation 6
1.4 Research Purpose 7
1.5 Thesis Organization 7
Chapter 2 Literature Review 9
2.1 Lexicon-based Sentiment Analysis 9
2.2 Adding Sentiment Information to Word 10
2.3 Expanding Sentiment Words Automatically 12
Chapter 3 Our Approach 15
3.1 Overall Process 15
3.2 Data Collection 16
3.3 Data Preprocessing 17
3.3.1 Data Cleaning 17
3.3.2 Segmentation, tokenization and Part-of-Speech Tagging 18
3.4 Generating Word Representations 19
3.5 Building Sentiment Prediction Model 24
3.6 Label Propagation 27
3.6.1 Label Propagation Algorithm 27
3.6.2 Label Propagation in batches 30
3.6.3 Seed Selection 35
Chapter 4 Evaluation 36
4.1 Dataset Construction 36
4.2 Parameter selection in our approach 37
4.3 Comparing with Other Methods 42
4.3.1 comparing methods without label propagation 42
4.3.2 comparing methods with label propagation 45
4.4 Uniqueness of our domain-specific sentiment lexicon 50
4.5 Short discussion in opposite polarity problem 52
Chapter 5 Conclusion 55
References 56
Appendix – Chinese Sentiment Lexicon Extracted from Booking.com 61
Positive words 61
Negative words 61
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., . . . Devin, M. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
Aydoğan, E., & Akcayol, M. A. (2016). A comprehensive survey for sentiment analysis tasks using machine learning techniques. Paper presented at the INnovations in Intelligent SysTems and Applications (INISTA), 2016 International Symposium on.
Bai, A., Hammer, H., Yazidi, A., & Engelstad, P. (2014). Constructing sentiment lexicons in Norwegian from a large text corpus. Paper presented at the Computational Science and Engineering (CSE), 2014 IEEE 17th International Conference on.
Bradley, M. M., & Lang, P. J. (1999). Affective norms for English words (ANEW): Instruction manual and affective ratings. Retrieved from
Chan, T.-Y., & Chang, Y.-S. (2017). Enhancing classification effectiveness of Chinese news based on term frequency. Paper presented at the Cloud and Service Computing (SC2), 2017 IEEE 7th International Symposium on.
Chen, G., Ye, D., Xing, Z., Chen, J., & Cambria, E. (2017). Ensemble application of convolutional and recurrent neural networks for multi-label text categorization. Paper presented at the Neural Networks (IJCNN), 2017 International Joint Conference on.
Costello, C., Lin, R., Mruthyunjaya, V., Bolla, B., & Jankowski, C. (2018). Multi-Layer Ensembling Techniques for Multilingual Intent Classification. arXiv preprint arXiv:1806.07914.
Cunha, J., Silva, C., & Antunes, M. (2015). Health twitter big bata management with hadoop framework. Procedia Computer Science, 64, 425-431.
Dong, Z., & Dong, Q. (2003). HowNet-a hybrid language and knowledge resource. Paper presented at the Natural Language Processing and Knowledge Engineering, 2003. Proceedings. 2003 International Conference on.
dos Santos, C., & Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. Paper presented at the Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers.
Esuli, A., & Sebastiani, F. (2007). SentiWordNet: a high-coverage lexical resource for opinion mining. Evaluation, 17, 1-26.
Fast, E., Chen, B., & Bernstein, M. S. (2016). Empath: Understanding topic signals in large-scale text. Paper presented at the Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems.
Giulianelli, M. (2017). Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings. arXiv:1708.03910v1.
Godbole, N., Srinivasaiah, M., & Skiena, S. (2007). Large-Scale Sentiment Analysis for News and Blogs. Icwsm, 7(21), 219-222.
Hamilton, W. L., Clark, K., Leskovec, J., & Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. Paper presented at the Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing.
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining.
Huang, E. H., Socher, R., Manning, C. D., & Ng, A. Y. (2012). Improving word representations via global context and multiple word prototypes. Paper presented at the Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1.
Jieba. (2018). Retrieved from https://github.com/fxsjy/jieba
Khoo, C. S., & Johnkhan, S. B. (2017). Lexicon-based sentiment analysis: Comparative evaluation of six sentiment lexicons. Journal of Information Science, 0165551517703514.
Ku, L. W., & Chen, H. H. (2007). Mining opinions from the Web: Beyond relevance retrieval. Journal of the American Society for Information Science and Technology, 58(12), 1838-1850.
Kumar, A., & Soman, K. (2016). Amritacen at semeval-2016 task 11: Complex word identification using word embedding. Paper presented at the Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016).
Labutov, I., & Lipson, H. (2013). Re-embedding words. Paper presented at the Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
Liu, B., & Zhang, L. (2012). A survey of opinion mining and sentiment analysis Mining text data (pp. 415-463): Springer.
Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011). Learning word vectors for sentiment analysis. Paper presented at the Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies-volume 1.
Marr, B. (2018). How Much Data Do We Create Every Day? The Mind-Blowing Stats Everyone Should Read. Retrieved from https://www.forbes.com/sites/bernardmarr/2018/05/21/how-much-data-do-we-create-every-day-the-mind-blowing-stats-everyone-should-read/2/#3d9fc622616c
Medhat, W., Hassan, A., & Korashy, H. (2014). Sentiment analysis algorithms and applications: A survey. Ain Shams Engineering Journal, 5(4), 1093-1113.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Paper presented at the Advances in neural information processing systems.
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436-465.
Nagwani, N. K., & Sharaff, A. (2017). SMS spam filtering and thread identification using bi-level text classification and clustering techniques. Journal of Information Science, 43(1), 75-87.
Niekler, A., Wiedemann, G., & Heyer, G. (2017). Leipzig Corpus Miner-A Text Mining Infrastructure for Qualitative Data Analysis. arXiv preprint arXiv:1707.03253.
Peng, W., & Park, D. H. (2004). Generate adjective sentiment dictionary for social media sentiment analysis using constrained nonnegative matrix factorization. Urbana, 51, 61801.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahway: Lawrence Erlbaum Associates, 71(2001), 2001.
Rothe, S., Ebert, S., & Schütze, H. (2016). Ultradense word embeddings by orthogonal transformation. arXiv preprint arXiv:1602.07572.
Rouvier, M., & Favre, B. (2016). SENSEI-LIF at SemEval-2016 Task 4: Polarity embedding fusion for robust sentiment analysis. Paper presented at the Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016).
Salton, G., Wong, A., & Yang, C.-S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613-620.
Sari, Y., & Stevenson, M. (2016). Exploring Word Embeddings and Character N-Grams for Author Clustering. Paper presented at the CLEF (Working Notes).
Schneider, C. (2016). The biggest data challenges that you might not even know you have. Retrieved from https://www.ibm.com/blogs/watson/2016/05/biggest-data-challenges-might-not-even-know/
Socher, R., Pennington, J., Huang, E. H., Ng, A. Y., & Manning, C. D. (2011). Semi-supervised recursive autoencoders for predicting sentiment distributions. Paper presented at the Proceedings of the conference on empirical methods in natural language processing.
Stone, P. J., Dunphy, D. C., & Smith, M. S. (1966). The general inquirer: A computer approach to content analysis.
Tai, Y.-J., & Kao, H.-Y. (2013). Automatic domain-specific sentiment lexicon generation with label propagation. Paper presented at the Proceedings of International Conference on Information Integration and Web-based Applications & Services.
Tang, D., Wei, F., Qin, B., Yang, N., Liu, T., & Zhou, M. (2016). Sentiment embeddings with applications to sentiment analysis. IEEE Transactions on Knowledge and Data Engineering, 28(2), 496-509.
Tang, D., Wei, F., Qin, B., Zhou, M., & Liu, T. (2014). Building large-scale twitter-specific sentiment lexicon: A representation learning approach. Paper presented at the Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers.
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. Paper presented at the Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1.
Volkova, S., Dolan, W. B., & Wilson, T. (2012). CLex: a lexicon for exploring color, concept and emotion associations in language. Paper presented at the Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics.
Wang, P., Xu, J., Xu, B., Liu, C., Zhang, H., Wang, F., & Hao, H. (2015). Semantic clustering and convolutional neural network for short text categorization. Paper presented at the Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers).
Wohlgenannt, G., Chernyak, E., & Ilvovsky, D. (2016). Extracting social networks from literary text with word embedding tools. Paper presented at the Proceedings of the Workshop on Language Technology Resources and Tools for Digital Humanities (LT4DH).
Xiu, Y., Lan, M., Wu, Y., & Lang, J. (2017). Exploring semantic content to user profiling for user cluster-based collaborative point-of-interest recommender system. Paper presented at the Asian Language Processing (IALP), 2017 International Conference on.
Zhang, L., Ghosh, R., Dekhil, M., Hsu, M., & Liu, B. (2011). Combining lexiconbased and learning-based methods for twitter sentiment analysis. HP Laboratories, Technical Report HPL-2011, 89.
Zhu, X., & Ghahramani, Z. (2002). Learning from labeled and unlabeled data with label propagation.
Zopf, M., Mencía, E. L., & Fürnkranz, J. (2018). Which Scores to Predict in Sentence Regression for Text Summarization? Paper presented at the Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers).
徐琳宏, 林鸿飞, 潘宇, 任惠, & 陈建美. (2008). 情感词汇本体的构造. 情报学报, 27(2), 180-185.
張津挺. (2015). 中文財務情緒字典建構與其在財務新聞分析之應用.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊