跳到主要內容

臺灣博碩士論文加值系統

(44.212.94.18) 您好!臺灣時間:2023/12/09 09:55
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:石佳永
研究生(外文):Shih, Chia-Yung
論文名稱:內心對話: 基於自問自答與人機協作的反思方法以擴增大型語言模型思考能力
論文名稱(外文):Inner Dialogue: Enable Critical Thinking in Large Language Models Through Self-Question-Answering and Human-AI Collaboration
指導教授:楊中平楊中平引用關係盧文祥盧文祥引用關係
指導教授(外文):Young, Chung-PingLu, Wen-Hsiang
口試委員:梁勝富楊中平盧文祥
口試委員(外文):Liang, Sheng-FuYoung, Chung-PingLu, Wen-Hsiang
口試日期:2023-07-31
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:30
中文關鍵詞:擴增大型語言模型人機協作開放式長問答自然語言生成語意評估
外文關鍵詞:Augmented large language modelhuman-AI collaborationlong-form open-ended question answeringsemantic natural language evaluation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:19
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
近年大型語言模型的進展促成一種新興建構自動化任務執行機器人方式,這些自動化任務執行機器人被用在執行許多日常的複雜任務例如:蒐集關於某個議題的資料、寫一個股票投資程式等等。儘管如此,現有的方法多是透過給予大型語言模型『提示』的方式使其俱備『思考』、『自我反省』的能力,這些『提示』由粗糙的字組成,例如:『進行反省』、『進行批判性思考』,或是一些籠統的指令例如:『請進行具有建設性的自我批判』。這些粗糙的『提示』依賴大型語言模型透過其自回歸 (autoregressive) 機制生成不穩定、不可預測、錯誤百出的思考路徑。
人類常透過組織『小組討論』來共同發想複雜的開放式問題。啟發於此,本篇論文建構了一個由三大模組構成的系統:『詢問者』負責擬定關鍵的批判性問題,『回覆者』負責回覆批判性問題,最後由『指導員』進行討論整合以及提出下回合討論主題。得利於近年大型語言模型能力的提升,其所俱備的對話、指令執行能力使本系統的三大模組彼此能夠透過對話式角色扮演與彼此溝通完成『小組討論』。透過本系統,大型語言模型將俱備透明化的批判性思考能力,同時人類將能夠透過此對話式思考進行流程觀察以及介入,並促成人機合作的機會以期能夠共同解決更複雜的開放式問題。
本篇蒐集一個長回覆的開放式問答資料集 - T2CB,並開發了一套嶄新的語意導向自然語句評估方式 - Concept F1。透過 T2CB 建立開放式問答任務評估本系統,並以 Concept F1 進行回覆的語意評分,實驗結果顯示給定一個開放式問題,本系統能夠產生相比原大型語言模型更多面向、更深入的回覆。
Recent advancements in large language models (LLMs) have fostered a new paradigm in building autonomous agents that can solve complex tasks. However, many of the mechanism that tried to instruct LLMs to ”think” or ”self-correct” when trying to complete a complex task involved using coarse-grain prompts like ”reflection”, ”criticism”, or instruction like ”constructively self-criticize your big-picture behavior constantly”. This caused LLMs to come up with unpredictable and error-prone reasoning paths due to its simple auto-regressive nature. It further led to implicit, uninterpretable, and uncontrollable reasoning.
Inspired by the human panel discussion when dealing with complex open-ended questions, we constructed a system with 3 modules: an asker that asked critical questions, an answerer that focused on finding solutions, and finally, a supervisor who was responsible for making a conclusion and deciding the next discussion topic. These 3 modules formed a panel discussion through a dialog-based role-playing mechanism thanks to the dialog and instruction following capability of LLMs and result in a system that was capable of critical thinking. Furthermore, by decomposing critical thinking into asking, answering and concluding, the thinking process of LLM becomes transparent and controllable. This resulted in a human-AI collaboration interface which led to the potential of solving more complex problems.
We collected a long-form open-ended question answering dataset, T2CB, as the task for evaluation and proposed a novel natural language generation metrics, concept f1, for evaluating semantic concepts in a sentence. Using the T2CB dataset and concept F1 for evaluating our method, we found that augmented with our thinking framework, LLM was able to answer complex open-ended questions from a diverse perspective and in-depth thinking.
Abstract i
摘要 ii
誌謝 iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
1.1 Background 1
1.1.1 Large Language Model (LLM) 1
1.1.2 Instruction Tuning and Dialogue Tuning 1
1.1.3 Natural Language Interface 1
1.1.4 Autonomous Agent Bulid on Top of LLM 2
1.2 Motivation 2
1.3 Method 3
1.4 Contribution 3
Chapter 2. Related Work 6
2.1 Question Decomposition 6
2.2 Reasoning 6
2.3 Cooperative Agents 7
Chapter 3. Method 9
3.1 Architecture Overview 9
3.2 Supervisor 10
3.3 Asker 12
3.4 Answerer 13
3.5 Interactive Mode and Trace-of-mind 13
Chapter 4. Experiment 16
4.1 T2CB (Things to consider before) Dataset Creation 16
4.1.1 Data Collection 17
4.1.2 Dataset Analysis 18
4.2 Argument Tree and Arugment Tree Evaluation 20
4.2.1 The Definition of Argument Tree 20
4.2.2 The Challenge of Measuring the Similarity of 2 Argument Tree 20
4.2.3 Argument Path Coverage 21
4.3 Experiment on Argument Path Coverage 23
4.3.1 Task Setup 23
4.3.2 Result 23
4.3.3 Error Analysis 24
Chapter 5. Conclusion 26
5.1 Future Work 26
References 28
[1] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei, “Language models are few-shot learners,” in Advances in Neural Information Processing Systems (H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, eds.), vol. 33, pp. 1877–1901, Curran Associates, Inc., 2020.
[2] A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. W. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. M. Dai, T. S. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel, “Palm: Scaling language modeling with pathways,” arXiv.2204.02311, 2022.
[3] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, A. Rodriguez, A. Joulin, E. Grave, and G. Lample, “Llama: Open and efficient foundation language models,” arXiv.2302.13971, 2023.
[4] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. u. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017.
[5] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. L. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, J. Schulman, J. Hilton, F. Kelton, L. Miller, M. Simens, A. Askell, P. Welinder, P. Christiano, J. Leike, and R. Lowe, “Training language models to follow instructions with human feedback,” arXiv.2203.02155, 2022.
[6] OpenAI, “Introducing ChatGPT.” https://openai.com/blog/chatgpt, 2022.
[7] Y. Qin, S. Hu, Y. Lin, W. Chen, N. Ding, G. Cui, Z. Zeng, Y. Huang, C. Xiao, C. Han, Y. R. Fung, Y. Su, H. Wang, C. Qian, R. Tian, K. Zhu, S. Liang, X. Shen, B. Xu, Z. Zhang, Y. Ye, B. Li, Z. Tang, J. Yi, Y. Zhu, Z. Dai, L. Yan, X. Cong, Y. Lu, W. Zhao, Y. Huang, J. Yan, X. Han, X. Sun, D. Li, J. Phang, C. Yang, T. Wu, H. Ji, Z. Liu, and M. Sun, “Tool learning with foundation models,” arXiv.2304.08354, 2023.
[8] T. B. Richards, “Auto-GPT.” https://github.com/Significant-Gravitas/Auto-GPT, 2023.
[9] Y. Nakajima, “BabyAGI.” https://github.com/yoheinakajima/babyagi/tree/main, 2023.
[10] T.-X. Wang, K.-Y. Tsai, and W.-H. Lu, “Identifying real-life complex task names with task-intrinsic entities from microblogs,” in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (Baltimore, Maryland), pp. 470–475, Association for Computational Linguistics, June 2014.
[11] T.-X. Wang and W.-H. Lu, “Constructing complex search tasks with coherent subtask search goals,” ACM Trans. Asian Low-Resour. Lang. Inf. Process., vol. 15, dec 2015.
[12] T. Wang and W. Lu, “Identifying the names of complex search tasks with task-related entities,” Int. J. Comput. Linguistics Chin. Lang. Process., vol. 21, no. 1, 2016.
[13] S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan, “Tree of thoughts: Deliberate problem solving with large language models,” arXiv.2305.10601, 2023.
[14] T. Buzan and B. Buzan, The Mind Map Book. London: BBC Books, 2 ed., 1995.
[15] T. Khot, H. Trivedi, M. Finlayson, Y. Fu, K. Richardson, P. Clark, and A. Sabharwal, “Decomposed prompting: A modular approach for solving complex tasks,” arXiv.2210.02406, 2023.
[16] D. Zhou, N. Schärli, L. Hou, J. Wei, N. Scales, X. Wang, D. Schuurmans, C. Cui, O. Bousquet, Q. Le, and E. Chi, “Least-to-most prompting enables complex reasoning in large language models,” arXiv.2205.10625, 2023.
[17] J. Wei, X. Wang, D. Schuurmans, M. Bosma, b. ichter, F. Xia, E. Chi, Q. V. Le, and D. Zhou, “Chain-of-thought prompting elicits reasoning in large language models,” in Advances in Neural Information Processing Systems (S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, eds.), vol. 35, pp. 24824–24837, Curran Associates, Inc., 2022.
[18] T. Kojima, S. S. Gu, M. Reid, Y. Matsuo, and Y. Iwasawa, “Large language models are zero-shot reasoners,” arXiv.2205.11916, 2023.
[19] X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou, “Self-consistency improves chain of thought reasoning in language models,” arXiv.2203.11171, 2023.
[20] O. Press, M. Zhang, S. Min, L. Schmidt, N. A. Smith, and M. Lewis, “Measuring and narrowing the compositionality gap in language models,” arXiv.2210.03350, 2023.
[21] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y. Cao, “React: Synergizing reasoning and acting in language models,” arXiv.2210.03629, 2023.
[22] L. Wang, W. Xu, Y. Lan, Z. Hu, Y. Lan, R. K.-W. Lee, and E.-P. Lim, “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models,” arXiv.2305.04091, 2023.
[23] N. Shinn, F. Cassano, B. Labash, A. Gopinath, K. Narasimhan, and S. Yao, “Reflexion: Language agents with verbal reinforcement learning,” arXiv.2303.11366, 2023.
[24] J. Long, “Large language model guided tree-of-thought,” arXiv.2305.08291, 2023.
[25] G. Li, H. A. A. K. Hammoud, H. Itani, D. Khizbullin, and B. Ghanem, “Camel: Communicative agents for ”mind” exploration of large scale language model society,” arXiv.2303.17760, 2023.
[26] V. Nair, E. Schumacher, G. Tso, and A. Kannan, “Dera: Enhancing large language model completions with dialog-enabled resolving agents,” arXiv.2303.17071, 2023.
[27] Z. Yang, P. Qi, S. Zhang, Y. Bengio, W. Cohen, R. Salakhutdinov, and C. D. Manning, “HotpotQA: A dataset for diverse, explainable multi-hop question answering,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, (Brussels, Belgium), pp. 2369–2380, Association for Computational Linguistics, Oct.-Nov. 2018.
[28] M. Joshi, E. Choi, D. Weld, and L. Zettlemoyer, “TriviaQA: A large scale distantly supervised challenge dataset for reading comprehension,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), (Vancouver, Canada), pp. 1601–1611, Association for Computational Linguistics, July 2017.
[29] P. Rajpurkar, R. Jia, and P. Liang, “Know what you don’t know: Unanswerable questions for SQuAD,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), (Melbourne, Australia), pp. 784–789, Association for Computational Linguistics, July 2018.
[30] A. Fan, Y. Jernite, E. Perez, D. Grangier, J. Weston, and M. Auli, “ELI5: Long form question answering,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, (Florence, Italy), pp. 3558–3567, Association for Computational Linguistics, July 2019.
[31] Y. Deng, W. Lam, Y. Xie, D. Chen, Y. Li, M. Yang, and Y. Shen, “Joint learning of answer selection and answer summary generation in community question answering,” arXiv.1911.09801, 2019.
[32] D. Khashabi, A. Ng, T. Khot, A. Sabharwal, H. Hajishirzi, and C. Callison-Burch, “GooAQ: Open question answering with diverse answer types,” in Findings of the Association for Computational Linguistics: EMNLP 2021, (Punta Cana, Dominican Republic), pp. 421–433, Association for Computational Linguistics, Nov. 2021.
[33] N. Reimers and I. Gurevych, “Sentence-BERT: Sentence embeddings using Siamese BERT-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), (Hong Kong, China), pp. 3982–3992, Association for Computational Linguistics, Nov. 2019.
[34] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” Soviet physics. Doklady, vol. 10, pp. 707–710, 1965.
[35] R. Razzouk and V. Shute, “What is design thinking and why is it important?,” Review of Educational Research, vol. 82, pp. 330–348, 09 2012.
電子全文 電子全文(網際網路公開日期:20280731)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top