

( 您好!臺灣時間:2024/09/15 22:57
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


論文名稱(外文):Evaluate the quality of Retrieval-Augmented Generation-an example of LLM-based auto parts quotation chatbot
指導教授(外文):CHU, HSUEH-TING
外文關鍵詞:LLMsRetrieval-Augmented GenerationQuotePrompt EngineeringEvaluation
  • 被引用被引用:0
  • 點閱點閱:173
  • 評分評分:
  • 下載下載:40
  • 收藏至我的研究室書目清單書目收藏:1

With the rise of generative artificial intelligence and large language models, related technologies demonstrate tremendous development potential across various industries, from basic research to practical applications. This study employs a Retrieval-Augmented Generation (RAG) framework, applying this technology to the automotive parts trading sector with the aim of enhancing the efficiency of inquiries and quotations. The study first establishes a dataset, collecting product information and filtering it to protect commercial secrets. Using the LangChain framework as the development platform, prompt design is conducted to enable the language model to accurately identify key information such as product types and prices.
Furthermore, the study improves the logic of the model in handling inquiry and quotation dialogues, ensuring that it does not provide answers when the information provided by the user is incomplete, thus ensuring the accuracy and reliability of the product information provided. Finally, performance evaluations of the entire system architecture are conducted using the all-MiniLM-L6-v2 and ChatGPT-4 models. After extensive prompt engineering, vector similarity increased from 44% to 58%, and dialogue logic scores improved from 46.94/100 to 71.82/100, demonstrating the effectiveness of prompt engineering in practical applications.
This study not only significantly reduces operational costs for enterprises but also effectively enhances the overall efficiency of handling inquiries and quotations, showcasing the practical value of large language models in commercial applications. Additionally, the research results provide an empirical foundation for the future application of large language models in other industries, particularly in fields with highly customized demands. This example highlights the importance of evaluating and improving RAG quality in practical applications and anticipates further research and improvements based on this foundation.

摘要 i
Abstract ii
目錄 iv
圖目錄 v
表目錄 vii
第一章 緒論 1
第一節 研究背景與動機 1
第二節 研究目的 2
第三節 論文架構 3
第二章 文獻探討 4
第一節 大型語言模型與其限制 4
第二節 提示工程與相關研究 7
第三節 檢索增強生成(RAG) 10
第四節 檢索增強生成的評估與相關研究 11
第三章 研究方法與設計 16
第一節 研究架構與流程 16
第二節 Langchain框架 17
第三節 資料來源與處理 18
第四節 RAG汽車零件報價機器人的程式說明 19
第四章 評估報價機器人與實驗結果 24
第一節 基準真相 24
第二節 汽車零件報價機器人的評估方法 25
第三節 評估報價機器人的流程與程式說明 28
第四節 實驗結果 31
第五章 結論與展望 38
第一節 研究成果 38
第二節 研究限制 38
第三節 未來展望 39
參考文獻 40
圖 2.1 Transformer架構圖[2] 4
圖 2.2 提示工程的三種方式示意圖[5] 7
圖 2.3 LAMBADA數據集測試結果[5] 8
圖 2.4一般提示與CoT提示之差異圖[11] 8
圖 2.5 CoT實驗結果圖[11] 9
圖 2.6 檢索增強生成模型架構圖[12] 10
圖 2.7 RAG評估標準-抗噪性示意圖 11
圖 2.8 RAG評估標準-否定拒絕示意圖 12
圖 2.9 RAG評估標準-資訊整合示意圖 12
圖 2.10 RAG評估標準-反事實的穩健性示意圖 13
圖 2.11 RAGR框架圖[17] 15
圖 3.1報價機器人RAG架構圖 16
圖 3.2評估流程架構圖 17
圖 3.3 langchain平台示意圖 17
圖 3.4原始資料示意圖 18
圖 3.5程式運行環境及資料導入程式碼 19
圖 3.6對話系統與問答機制的整合程式碼 20
圖 3.7初始化對話系統 21
圖 3.8記憶功能 21
圖 3.9實體篩選器-部分程式碼 22
圖 3.10生成對話提示-部分程式碼 23
圖 3.11生成對話提示-部分程式碼 23
圖 4.1問答集示意圖 24
圖 4.2 SBERT架構向量推理示意圖[21] 25
圖 4.3 gpt-4具有判定邏輯示意圖 27
圖 4.4進行比對前所需之設定程式碼 28
圖 4.5文本預處理部分程式碼 28
圖 4.6向量相似度比對程式碼 29
圖 4.7 設定模型並對其進行提示 29
圖 4.8對話邏輯評分機器人結果分析 30
圖 4.9 RAG(SP)向量相似度評估過程與結果 31
圖 4.10 RAG(WP)向量相似度評估過程與結果 31
圖 4.11 RAG(SP)對話邏輯評估過程與結果 32
圖 4.12 RAG(WP)對話邏輯評估過程與結果 32
表 2.1 Examples of Hallucinations from Misinformation and Biases. [6] 6
表 2.2 Example of Knowledge Boundary. [6] 6
表 2.3危險邊緣問題生成任務的人類評估[11] 10
表 2.4 RAGAS框架實驗結果表 14
表 4.1 部分問答集向量相似度評估列表 33
表 4.2 部分問答集對話邏輯評估列表 35

1.Kaddour, J., et al. Challenges and Applications of Large Language Models. 2023. arXiv:2307.10169 DOI: 10.48550/arXiv.2307.10169.
2.Vaswani, A., et al. Attention Is All You Need. 2017. arXiv:1706.03762 DOI: 10.48550/arXiv.1706.03762.
3.Devlin, J., et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. arXiv:1810.04805 DOI: 10.48550/arXiv.1810.04805.
4.Yang, Z., et al. XLNet: Generalized Autoregressive Pretraining for Language Understanding. 2019. arXiv:1906.08237 DOI: 10.48550/arXiv.1906.08237.
5.Brown, T.B., et al. Language Models are Few-Shot Learners. 2020. arXiv:2005.14165 DOI: 10.48550/arXiv.2005.14165.
6.Minaee, S., et al. Large Language Models: A Survey. 2024. arXiv:2402.06196 DOI: 10.48550/arXiv.2402.06196.
7.Huang, L., et al. A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions. 2023. arXiv:2311.05232 DOI: 10.48550/arXiv.2311.05232.
8.Tian, S., et al., Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Briefings in Bioinformatics, 2024. 25(1).
9.White, J., et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. 2023. arXiv:2302.11382 DOI: 10.48550/arXiv.2302.11382.
10.Amatriain, X. Prompt Design and Engineering: Introduction and Advanced Methods. 2024. arXiv:2401.14423 DOI: 10.48550/arXiv.2401.14423.
11.Wei, J., et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. 2022. arXiv:2201.11903 DOI: 10.48550/arXiv.2201.11903.
12.Lewis, P., et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. 2020. arXiv:2005.11401 DOI: 10.48550/arXiv.2005.11401.
13.Xu, L., et al., Nanjing Yunjin intelligent question-answering system based on knowledge graphs and retrieval augmented generation technology. Heritage Science, 2024. 12(1): p. 118.
14.Salemi, A. and H. Zamani Evaluating Retrieval Quality in Retrieval-Augmented Generation. 2024. arXiv:2404.13781 DOI: 10.48550/arXiv.2404.13781.
15.Chen, J., et al. Benchmarking Large Language Models in Retrieval-Augmented Generation. 2023. arXiv:2309.01431 DOI: 10.48550/arXiv.2309.01431.
16.Es, S., et al. RAGAS: Automated Evaluation of Retrieval Augmented Generation. 2023. arXiv:2309.15217 DOI: 10.48550/arXiv.2309.15217.
17.Yu, H., et al. Evaluation of Retrieval-Augmented Generation: A Survey. 2024. arXiv:2405.07437 DOI: 10.48550/arXiv.2405.07437.
18.Saad-Falcon, J., et al. ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. 2023. arXiv:2311.09476 DOI: 10.48550/arXiv.2311.09476.
19.Hoshi, Y., et al. RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models. 2023. arXiv:2308.10633 DOI: 10.48550/arXiv.2308.10633.
20.Pandya, K. and M. Holia Automating Customer Service using LangChain: Building custom open-source GPT Chatbot for organizations. 2023. arXiv:2310.05421 DOI: 10.48550/arXiv.2310.05421.
21.Reimers, N. and I. Gurevych Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. 2019. arXiv:1908.10084 DOI: 10.48550/arXiv.1908.10084.

第一頁 上一頁 下一頁 最後一頁 top