跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/17 06:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:葉駿杰
研究生(外文):YEH, JIUN-JIE
論文名稱:在中文文本分類任務下分析比較大型語言模型與Transformer模型:效能、效率與資源消耗
論文名稱(外文):Comparative Analysis of Large-Language Models and Transformers for Chinese Text Classification: Performance, Efficiency, and Resource
指導教授:費南多
指導教授(外文):Fernando Calderon
口試委員:李曉祺盧淑萍
口試日期:2024-07-29
學位類別:碩士
校院名稱:輔仁大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:27
中文關鍵詞:大型語言模型BERT中文文本分類TransformersLlama3TAIDE
外文關鍵詞:Large-Language-ModelsBERTChinese text-classificationTransformersLlama3TAIDE
相關次數:
  • 被引用被引用:0
  • 點閱點閱:32
  • 評分評分:
  • 下載下載:8
  • 收藏至我的研究室書目清單書目收藏:0
在自然語言處理(NLP)領域,基於深度學習的語言模型,特別是大型語言模型(LLM)和基於Transformers架構的模型,如BERT,在文本分類任務中展示了顯著的性能。然而,選擇最適合特定任務的模型仍是一個挑戰。本研究針對中文文本分類,分別就LLM及Transformers架構中選擇包含以Llama3為基底的TAIDE在內的幾種已進行中文預訓練的模型進行深入比較。通過在相同的資料集上進行的情感分類和類別分類實驗,我們以這些模型在各任務中的預測準確度(Accuracy)作為評估標準,以及這些模型在不同資料量下的表現和資源消耗。
實驗結果顯示,雖然LLM在準確度上與基於Transformers的模型相當,但LLM需要更多的計算資源和時間。基於Transformers的模型在需要高效率和低資源消耗的場景中仍然表現出色。特別是在少量訓練資料的情況下,這些模型能夠有效地提取特徵並保持高準確度。本研究為選擇最適合中文文本分類的語言模型提供了有價值的見解,並對於提升自然語言處理技術在全球語境中的應用具有重要意義。

In the field of Natural Language Processing (NLP), deep learning-based language models, especially Large Language Models (LLMs) and Transformers-based architectures such as BERT, have demonstrated significant performance in text categorization tasks. However, selecting the most suitable model for a particular task is still a challenge. In this study, we conduct an in-depth comparison of several models that have been pre-trained for Chinese text categorization in LLM and Transformers architectures, including Llama3-based TAIDE, respectively. Through experiments on sentiment classification and category classification on the same dataset, we evaluate these models in terms of their predictive accuracy in each task, as well as their performance and resource consumption under different amount of data.
The experimental results show that although LLM is comparable to the Transformers-based model in terms of accuracy, LLM requires more computational resources and time. The Transformers-based model still performs well in scenarios that require high efficiency and low resource consumption. Especially with a small amount of training data, these models are able to extract features efficiently and maintain high accuracy. This study provides valuable insights for selecting the most suitable language model for Chinese text classification, and is important for enhancing the application of natural language processing techniques in global contexts.

附圖目錄 v
第一章 緒論 1
1.1研究動機 1
1.2論文研究方向及主要貢獻 2
第二章 文獻綜述 4
2.1. 文本分類的背景 4
2.1.1 挑戰 4
2.2 傳統的文本分類方法 4
2.2.1 特徵提取方法 4
2.2.2 分類算法 5
2.3 Transformers 變革 6
2.3.1 Transformer 架構的引入 6
2.3.2 BERT(Bidirectional Encoder Representations from Transformers) 6
2.3.3 RoBERTa 和 ALBERT 6
2.4 大型語言模型的崛起 7
2.4.1 GPT(Generative Pre-trained Transformer)系列 7
2.4.2 大型語言模型的應用與挑戰 7
2.4.3 常見於大型語言模型的分類方法 7
2.5 多模型性能比較 8
2.5.1 Transformer 架構模型的性能 8
2.5.2 大型語言模型的優勢 8
2.5.3 資源消耗與效率 8
2.6 當前研究的挑戰與機會 9
2.6.1 模型訓練與改進 9
2.6.2 多任務學習與遷移學習 9
第三章 實驗設計 10
3.1 環境設定 10
3.2 資料集 10
3.2.1 SemEval-2016 Task 5[29] 10
3.2.2 Tnews 11
3.2.3 Cnews 11
3.2.4 iflytek 11
3.3 參與實驗之模型介紹 12
3.3.1大型語言模型 12
3.3.2 Transformer模型 13
3.4 分類方法選擇 13
3.4.1 零樣本分類 13
3.4.2 微調 14
3.4.3 思考鏈 14
3.5情感分類實驗 14
3.6 類別分類實驗 14
第四章 結果與分析 15
4.1 情感二元分類實驗結果 15
4.2 類別分類實驗測試結果 16
4.2.1 短文本類別分類實驗 16
4.2.2 長文本類別分類實驗 17
4.2.3 長文本多類別分類實驗 18
4.2.4 思考鏈實驗 19
第五章 討論 21
5.1 模型表現的比較 21
5.2 資源消耗與效率 21
5.3 訓練資料大小與模型性能 21
5.4挑戰與限制 21
5.4.1 硬體需求限制 21
5.4.2 資料集尺寸限制 22
5.4.3 思考鏈幻覺問題 22
5.5總結 24
第六章 結論和未來工作 25
參考文獻 26

[1]R. Collobert and J. Weston, "A unified architecture for natural language processing: Deep neural networks with multitask learning," in Proceedings of the 25th international conference on Machine learning, 2008, pp. 160-167.
[2]V. Ashish et al., "Attention is All you Need," Neural Information Processing Systems, 2017.
[3]A. Conneau et al., "Unsupervised cross-lingual representation learning at scale," arXiv preprint arXiv:1911.02116, 2019.
[4]Y. Cui, Z. Yang, and X. Yao, "Efficient and effective text encoding for chinese llama and alpaca," arXiv preprint arXiv:2304.08177, 2023.
[5]P. Ennen et al., "Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results," arXiv preprint arXiv:2303.04715, 2023.
[6]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[7]Y.-J. Lee et al. "Trustworthy AI Dialogue Engine 推動可信任生成式AI發展先期計畫." https://taide.tw/index/about/project-overview (accessed.
[8]G. Salton, A. Wong, and C.-S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, no. 11, pp. 613-620, 1975.
[9]Z. S. Harris, "Distributional structure," Word, vol. 10, no. 2-3, pp. 146-162, 1954.
[10]T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," arXiv preprint arXiv:1301.3781, 2013.
[11]J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.
[12]C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995.
[13]L. Breiman, "Random forests," Machine learning, vol. 45, pp. 5-32, 2001.
[14]R. O. Duda and P. E. Hart, Pattern classification and scene analysis. Wiley New York, 1973.
[15]T. Cover and P. Hart, "Nearest neighbor pattern classification," IEEE transactions on information theory, vol. 13, no. 1, pp. 21-27, 1967.
[16]J. R. Quinlan, "Induction of Decision Trees," Machine Learning, 1986, doi: 10.1023/a:1022643204877.
[17]D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," nature, vol. 323, no. 6088, pp. 533-536, 1986.
[18]S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[19]Z. Lan, M. Chen, S. Goodman, K. Gimpel, P. Sharma, and R. Soricut, "Albert: A lite bert for self-supervised learning of language representations," arXiv preprint arXiv:1909.11942, 2019.
[20]J. Achiam et al., "Gpt-4 technical report," arXiv preprint arXiv:2303.08774, 2023.
[21]M. Palatucci, D. Pomerleau, G. E. Hinton, and T. M. Mitchell, "Zero-shot learning with semantic output codes," Advances in neural information processing systems, vol. 22, 2009.
[22]O. Vinyals, C. Blundell, T. Lillicrap, and D. Wierstra, "Matching networks for one shot learning," Advances in neural information processing systems, vol. 29, 2016.
[23]J. Howard and S. Ruder, "Universal language model fine-tuning for text classification," arXiv preprint arXiv:1801.06146, 2018.
[24]G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.
[25]Y. LeCun, J. Denker, and S. Solla, "Optimal brain damage," Advances in neural information processing systems, vol. 2, 1989.
[26]J. Wei et al., "Chain-of-thought prompting elicits reasoning in large language models," Advances in neural information processing systems, vol. 35, pp. 24824-24837, 2022.
[27]S. Yao et al., "Tree of thoughts: Deliberate problem solving with large language models," Advances in Neural Information Processing Systems, vol. 36, 2024.
[28]T. Wolf et al., Q. Liu and D. Schlangen, Eds. Transformers: State-of-the-Art Natural Language Processing. Online: Association for Computational Linguistics, 2020, pp. 38-45
[29]M. Pontiki et al., "Semeval-2016 task 5: Aspect based sentiment analysis," in ProWorkshop on Semantic Evaluation (SemEval-2016), 2016: Association for Computational Linguistics, pp. 19-30.
[30]L. Xu et al., "CLUE: A Chinese language understanding evaluation benchmark," arXiv preprint arXiv:2004.05986, 2020.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top