跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/09 03:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鍾士慕
研究生(外文):Shih-Mu Jhong
論文名稱:深度學習技術在中文輿情分析之應用:以BERT演算法為例
論文名稱(外文):The Application of Deep Learning in Chinese Public Opinion Analysis– A Case Study of BERT
指導教授:邱昭彰邱昭彰引用關係
指導教授(外文):Chao-Chang Chiu
口試委員:徐綺憶謝瑞建
口試委員(外文):Chi-I HsuJui-Chien Hsieh
口試日期:2019-06-19
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:34
中文關鍵詞:文本分類TF-IDFWord2Vec機器學習深度學習BERT
外文關鍵詞:Text ClassificationTF-IDFWord2VecDeep LearningBERT
相關次數:
  • 被引用被引用:4
  • 點閱點閱:904
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
語言是思想的載體,是人類交流思想、表達情緒最自然的工具,也是人類有別於其他物種的個質特性,屬於人工智慧領域的一個重要的研究方向,而文本分類一直是自然語言處理中比較廣泛的項目,從早期的TF-IDF及機器學習到2013年Google發表的Word2Vec方法,最後到現在新趨勢的Transformer模型,文本分類的各種演算方法一直在不斷的進步中,尤其是在有深度學習後更明顯得趨勢。
本研究使用文本分類上常用的演算法來分析比較,從傳統TF-IDF配合SVM、LG、RF、CART等機器學習方法到現在常見的Word2vec與LSTM以及最新趨勢Transformer模型的BERT,主要著重在比教分析從以前到現在,在文本分類上的各種新舊演算法,並評估這些演算法在繁體中文的文本分類上的表現,資料使用在網路繁體中文討論區所收集的中文輿情資料,其結果顯示傳統的TF-IDF做法在文本分類上仍然有不錯的表現。
Language is the carrier of thought, the most natural tool for human beings to exchange ideas and express emotions, and the unique nature of human beings from other species. It is an important research direction in the field of artificial intelligence, and text classification has always been in natural language processing. A wide range of projects, from the early TF-IDF and machine learning to the Word2Vec method published by Google in 2013, and finally to the new trend of the Transformer model, the various algorithms for text classification have been continuously improving, especially in depth. After learning, the trend is more obvious.
This study uses algorithms commonly used in text categorization to analyze and compare, from traditional TF-IDF with SVM, LG, RF, CART and other machine learning methods to the current common Word2vec and LSTM and the latest trend Transformer model BERT, mainly focusing on the ratio Teach and analyze various old and new algorithms in text classification from previous to present, and evaluate the performance of these algorithms in traditional Chinese text classification. The data is collected using Chinese lyric data collected in the traditional Chinese discussion forum. The traditional TF-IDF still shows good performance in text classification.
書名頁 i
論文口試委員審定書 ii
中文摘要 iii
英文摘要 iv
誌 謝 v
目 錄 vi
表目錄 viii
圖目錄 ix
第一章、緒論 1
1.1研究背景 1
1.2研究目的 3
第二章、文獻探討 4
2.1傳統做法 4
2.1.1前置處理-TF-IDF 4
2.1.2卡方檢驗 5
2.1.3 Logistic regression 5
2.1.4 SVM 5
2.1.5決策樹(CART) 6
2.1.6 Random Forest 5
2.2深度學習做法 6
2.2.1前置處理-Word2Vec 7
2.2.2 LSTM 7
2.3新近做法-BERT 8
2.3.1 Transformer 8
2.3.2 ELMo 10
2.3.3 GPT 11
2.3.4 BERT 12
第三章、研究方法 15
3.1研究架構 15
3.2資料擷取 15
3.2.1 PTT及D-card 15
3.2.2 文章斷詞 16
3.2.3 議題訂立 16
3.3演算法參數設定 17
3.4預測評估結果 20
第四章、實驗結果 21
4.1資料描述 21
4.2預測方式之結果 22
4.2.1電信資料結果 22
4.2.2大專院校資料結果 23
4.2.3 綜合比較結果 24
4.2.4 詳細分類結果 25
第五章、討論 27
5.1語義問題 27
5.2 BERT的繁體中文預訓練語言模型 27
5.3 Mask雙向語言模型 28
5.4 MT-DNN 28
5.5 ERNIE 29
5.6 BERT 29
5.7 總結 30
第六章、結論與未來展望 31
參考文獻 32
中文文獻
[1]. 沈志斌, 白清源. 文本分类中特征权重算法的改进. 南京师范大学学报, 2008, 8(4): 95-98.
[2]. 鲁松, 李晓黎, 白硕.文档中词语权重计算方法的改进.中文信息学报.2000, 14(6): 8-13.
[3]. 张保富, 施化吉, 马素琴.基于 TF-IDF 文本特征加权方法的改进研究.计算机应用与软件,2011, 28(2): 17-20.
[4]. 李平, 戴月明, 王艳.基于混合卡方统计量与逻辑回归的文本情感分析,2017: 13-20.
[5]. 李新福, 赵蕾蕾, 何海斌, 李芳.使用Logistic回归模型进行中文文本分类,2009: 11-17.
[6]. 庞剑锋, 卜东波, 白硕. 基于向量空间模型的文本自动分类方法的研究与实现. 计算机应用研究 ,2001 :5-6.
[7]. 刘勇, 兴艳云. 基于改进随机森林算法的文本分类研究与应用. 计算机系统应用, 2019, 28(5): 220-225.
[8]. 唐明, 朱磊, 邹显春. 基于Word2Vec的一种文档向量表示. 计算机科学, 2016, 43(6): 214-217, 269. DOI:10.11896/j.issn.1002-137X.2016.06.043
英文文獻
[1]. Salton, G., Yu, T. On the construction of effective vocabularies for information retrieval[J]. ACM Sigplan Notices, 1975, 9(3): 48-60.
[2]. Salton, G. Extended boolean information retrieval[J]. Cornell University, 1983, 11(4): 95-98. .
[3]. Lin, J.Using distributional similarity to identify individual verb choice. 2006:33-40.
[4]. Lan, M., Tan, C., Low, H. A comprehensive comparative study on term weighting schemes for text categorization with support vector machines[C]//Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. : ACM, 2005: 1032-1033.

[5]. Vapnik, V. The Nature of Statistical Learning Theory Springer,1995:3-4.
[6]. Esko, U. Online construction of suffix tree[M]. Algorithmica, 1995 :249-360.
[7]. Osmar, R., Maria-Luiza, A. Classifying text documents by associating terms with text categories, Australian Computer Science Communications, v. 24 n. 2 , January2February, 2002 :215-222.
[8]. William, C. Learning Rules that Classify E2Mail (Postscript) . In: The 1996 AAAI Spring Symposiumon Ma2 chine Learning in Information Access (1996).
[9]. Data mining; concepts and techniques, 3d ed. (2011). (Vol. 26): Ringgold, Inc.
[10]. Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984) Classification and Regression Trees. Chapman and Hall, Wadsworth, New York.
[11]. Zheng, XQ., Chen, HY., Xu, TY. Deep learning for Chinese word segmentation and POS tagging. Proceedings of 2013 Conference on Empirical Methods in Natural Language Processing. Seattle, WA, USA. 2013. 647-657.
[12]. Socher, R., Bauer, J., Manning, CD., Andrew, Y. Parsing with compositional vector grammars. Proceedings of the 51st Meeting of the Association for Computational Linguistics. Sofia, Bulgaria. 2013. 455-465.
[13]. Lilleberg, J., Zhu, Y., Zhang, YQ. Support vector machines and Word2vec for text classification with semantic features. Proceedings of the IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing. Beijing, China. 2015. 136-140.
[14]. Mikolov, T., Sutskever, I., Chen, K., et al. Distributed representations of words and phrases and their compositionality. Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, NV, USA. 2013. 3111-3119.
[15]. Turian, J., Ratinov, L., Bengio, Y. Word representations: A simple and general method for semi-supervised learning. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala, Sweden. 2010. 384–394.
[16]. Gershmans, S., Joshua, B. phrase similarity in humans and machines, 2015:776-781
[17]. Cho, K., Van Merriënboer, B., Gulcehre, C., et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of EMNLP Processing. Doha, Qatar:ACL Press, 2014:1724-1734.
[18]. Graves, A., Jaitly, N. Towards end-to-end speech recognition with recurrent neural networks[C]//Proceedings of ICML. Bejing, China:[s.n.], 2014:1764-1772.
[19]. Hochreiter, S., Schmidhuber, J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. DOI:10.1162/neco.1997.9.8.1735.
[20]. Ma, X., Hovy, E. End-to-end sequence labeling via bi-directional lstm-cnns-crf[C]//Proceedings of ACL. Berlin, Germany:ACL Press, 2016:1064-1074.
[21]. Tand, D., Qin, B., Liu, T. Document modeling with gated recurrent neural network for sentiment classification[C]//Proceedings of EMNLP. Lisbon, Portugal:ACL Press, 2015:1422-1432.
[22]. Ashish, V., Noam, S., Niki, P., Jakob, U., et al. Attention Is All You Need, 2017:9-11.
[23]. Matthew, E., Mark, N., Mohit, I., Matt, G., et al. Deep contextualized word representations, 2018:8-11.
[24]. Alec, R., Karthik, N., Tim, S., Ilya, S. Improving Language Understanding by Generative Pre-Training, 2018:8-10.
[25]. Liu, X., He, P., Chen, W., Gao, J. Multi-Task Deep Neural Networks for Natural Language Understanding, 2019:5-7.
[26]. Jacob, D., Chang, MW., Lee, K., Kristina, T. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018:11-15.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊