研究生(外文):Shih-Mu Jhong
論文名稱(外文):The Application of Deep Learning in Chinese Public Opinion Analysis– A Case Study of BERT
指導教授(外文):Chao-Chang Chiu
口試委員(外文):Chi-I HsuJui-Chien Hsieh
外文關鍵詞:Text ClassificationTF-IDFWord2VecDeep LearningBERT
Language is the carrier of thought, the most natural tool for human beings to exchange ideas and express emotions, and the unique nature of human beings from other species. It is an important research direction in the field of artificial intelligence, and text classification has always been in natural language processing. A wide range of projects, from the early TF-IDF and machine learning to the Word2Vec method published by Google in 2013, and finally to the new trend of the Transformer model, the various algorithms for text classification have been continuously improving, especially in depth. After learning, the trend is more obvious.
This study uses algorithms commonly used in text categorization to analyze and compare, from traditional TF-IDF with SVM, LG, RF, CART and other machine learning methods to the current common Word2vec and LSTM and the latest trend Transformer model BERT, mainly focusing on the ratio Teach and analyze various old and new algorithms in text classification from previous to present, and evaluate the performance of these algorithms in traditional Chinese text classification. The data is collected using Chinese lyric data collected in the traditional Chinese discussion forum. The traditional TF-IDF still shows good performance in text classification.
書名頁 i
論文口試委員審定書 ii
中文摘要 iii
英文摘要 iv
誌 謝 v
目 錄 vi
表目錄 viii
圖目錄 ix
第一章、緒論 1
1.1研究背景 1
1.2研究目的 3
第二章、文獻探討 4
2.1傳統做法 4
2.1.1前置處理-TF-IDF 4
2.1.2卡方檢驗 5
2.1.3 Logistic regression 5
2.1.4 SVM 5
2.1.5決策樹(CART) 6
2.1.6 Random Forest 5
2.2深度學習做法 6
2.2.1前置處理-Word2Vec 7
2.2.2 LSTM 7
2.3新近做法-BERT 8
2.3.1 Transformer 8
2.3.2 ELMo 10
2.3.3 GPT 11
2.3.4 BERT 12
第三章、研究方法 15
3.1研究架構 15
3.2資料擷取 15
3.2.1 PTT及D-card 15
3.2.2 文章斷詞 16
3.2.3 議題訂立 16
3.3演算法參數設定 17
3.4預測評估結果 20
第四章、實驗結果 21
4.1資料描述 21
4.2預測方式之結果 22
4.2.1電信資料結果 22
4.2.2大專院校資料結果 23
4.2.3 綜合比較結果 24
4.2.4 詳細分類結果 25
第五章、討論 27
5.1語義問題 27
5.2 BERT的繁體中文預訓練語言模型 27
5.3 Mask雙向語言模型 28
5.4 MT-DNN 28
5.5 ERNIE 29
5.6 BERT 29
5.7 總結 30
第六章、結論與未來展望 31
參考文獻 32
