論文名稱(外文):Knowledge Graph-Enhanced Retrieval-Augmented Generation for QA in Large Language Models
指導教授(外文):Chih-Yung ChangChin-Hwa Kuo
口試委員(外文):Yuh-Shyan ChenTzung-Shi Chen
外文關鍵詞:RAGKnowledge GraphSBERT
In today's digitally connected world, customer service centers play a crucial role, providing a direct communication channel between businesses and customers. With the continuous innovation of modern AI technologies, AI-assisted intelligent customer service centers have become an industry trend. For small and medium-sized enterprises (SMEs), the mainstream approach involves using large language models (LLMs) as the core of AI customer service, combined with a Retrieval-Augmented Generation (RAG) framework. By integrating proprietary data, this approach allows LLMs to generate more precise answers, effectively addressing issues related to inaccurate responses or hallucinations due to the lack of specialized knowledge or proprietary information in commercial LLMs. However, existing RAG frameworks, aside from being supported by other tech companies, still face several challenges when SMEs attempt to build them independently. These include optimization of retrieval capabilities, efficient data utilization, high system operation costs, and low time efficiency.

To address these issues, this paper proposes a RAG retriever training framework based on knowledge graphs and variant triple similarity SBERT. The primary goal is to establish more refined QA training data at a lower cost and in less time, thereby training a more effective RAG retriever model. In the first phase of this study, CKIP segmentation technology is used to extract keywords from text, followed by the BM25 algorithm to calculate text similarity and construct a text similarity knowledge graph. This phase enriches the structure of the knowledge graph through precise similarity analysis, enhancing the visualization of text correlations and retrieval efficiency. In the second phase, this paper uses the knowledge graph to determine the soft labels of similarity between the query and various texts. This method not only strengthens the relevance between the query and the texts but also refines the similarity relationships among multiple related texts. Such detailed associations provide the model with richer training signals, aiding in improving the precision and generalization capability of the RAG retriever. In the final phase, this paper introduces an innovative triple similarity SBERT architecture, expanding the training model from single-text matching to multi-dimensional matching between the query and multiple texts. This not only significantly increases the amount of training data but also improves data utilization. This architecture makes the pairing between queries and texts more accurate, effectively enhancing the model's adaptability and precision when handling complex queries.

The primary contribution of this paper lies in developing an efficient and low-power training framework that not only enhances the utilization of proprietary data and the accuracy of text retrieval but also significantly improves the model's adaptability and precision in handling complex queries. Experimental results show that the reference texts retrieved through this approach demonstrate better performance in generating answers with advanced models like LLAMA2 and GPT-4.

誌謝 I
目錄 VI
圖目錄 IX
表目錄 X
第一章 簡介 1
第二章 相關研究 8
2.1 傳統文本檢索技術 8
2.1.1 傳統統計方法: 8
2.1.2 基於淺層詞嵌入: 9
2.1.3 基於深度學習: 11
2.2 RAG檢索器增強技術 13
2.2.1 遞迴式檢索: 13
2.2.2 混合式檢索: 15
2.2.3 檢索器微調: 19
2.3 總覽 23
第三章 背景知識 25
3.1 Transformer 25
3.2 BERT 27
3.3 SBERT 28
3.4 知識圖譜 30
第四章 系統設計 32
4.1 整體架構 32
4.2 資料前處理與問題集生成 33
4.2.1 資料蒐集 33
4.2.2 資料前處理(資料清洗) 34
4.2.3 資料前處理(問題生成) 36
4.3 知識圖譜群構建 36
4.3.1 文本相識度比對 37
4.3.2 知識圖譜建立 39
4.4 問題軟標籤生成 40
4.5 向量檢索模型訓練 43
4.5.1 三元相似度SBERT 45
4.5.2 三元相似度SBERT訓練資料建立與模型訓練 47
第五章 實驗分析 49
5.1 資料集 49
5.2 環境與系統參數設定 50
5.3 實驗結果 50
5.3.1 模型參數優化實驗 52
5.3.2 SBERT消融實驗 59
5.3.3 TOP-N檢索率 65
第六章 結論 69
參考文獻 71

圖 1:系統架構圖 5
圖 2:Transformer架構 26
圖 3:BERT架構 28
圖 4:SBERT架構 30
圖 5:簡易知識圖譜圖 31
圖 6:系統設計整體架構 33
圖 7 :文本相識度比對 37
圖 8:知識圖譜建立 40
圖 9:問題軟標籤生成 40
圖10:PBMSR做法 42
圖 11:三元相似度SBERT 44
圖12:參數優化實驗- HR指標 55
圖13:參數優化實驗- MRR、ARP指標 58
圖14: SBERT消融實驗- HR、MRR指標 62
圖15:軟標籤消融實驗- HR、MRR指標 64
圖16:模型對比實驗- HR、MRR、ARP指標 68

表 1:相關研究比較表 23
表2:文本完全對應問題資料集 49
表3:本研究系統實驗環境 50
表4:參數優化實驗- HR指標 54
表5 :參數優化實驗- MRR、ARP指標 57
表6 :SBERT消融實驗 61
表7:軟標籤消融實驗 63
表8 :對比實驗-HR指標 65
表9:對比實驗-MRR指標 65
表10:對比實驗-ARP指標 66
