跳到主要內容

臺灣博碩士論文加值系統

(44.213.60.33) 您好!臺灣時間:2024/07/21 11:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:吳侑玹
研究生(外文):Wu, Yu-Hsuan
論文名稱:篇章結構導向提示學習於生醫文本簡明摘要研究
論文名稱(外文):Lay Summarization of Biomedical Documents with Discourse Structure-based Prompt Tuning
指導教授:高宏宇高宏宇引用關係
指導教授(外文):Kao, Hung-Yu
口試委員:謝孫源王惠嘉徐禕佑高宏宇
口試委員(外文):Hsieh, Sun-YuanWang, Hei-ChiaHsu, Yi-YuKao, Hung-Yu
口試日期:2023-07-21
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2023
畢業學年度:111
語文別:英文
論文頁數:51
中文關鍵詞:自然語言處理簡明摘要生成提示策略
外文關鍵詞:Natural Language ProcessingLay Summary GenerationPrompt Strategies
相關次數:
  • 被引用被引用:0
  • 點閱點閱:97
  • 評分評分:
  • 下載下載:9
  • 收藏至我的研究室書目清單書目收藏:0
將複雜的生物醫學文章轉化為簡明易懂的科普摘要對於彌合科學研究與公眾理解之間的差距具有重要意義。然而,現有的生醫摘要在生成上面臨許多挑戰,像是面對生醫專業資訊的多樣性轉換,以及,即便都是簡明摘要的生成任務,不同資料集之間的摘要生成方式仍有所不同。本論文提出了一種綜合性的方式來應對這些問題。首先,我們針對不同生醫資料集在原始文章與科普摘要之間的結構差異與簡明摘要生成進行詳細分析。其次,我們通過整合不同的生醫資料集,使模型能夠從更多元廣泛的生物醫學和主題中學習。但在整合資料集並同時進行訓練仍須克服不同資料集領域與摘要生成風格的差異。因此,我們引入了提示策略,使得模型能夠透過提示辨識並適應每個資料集的獨有特徵,使得模型於簡明摘要生成在面對不同資料集時能夠更加精準。我們的研究通過參加 BioLaySumm 生物醫學科普摘要競賽得到了驗證,相比其他方法,我們的方式獲得了目前最高的得分以及穩定性。最後,我們的研究結果強調我們的方式,即資料集結構分析與整合,加上提示策略的設計,對於提高生物醫學科普摘要生成的準確性以及有效性。
The translation of complex biomedical articles into concise and comprehensible lay summaries is of great significance in bridging the gap between scientific research and public understanding. However, existing methods for lay summary generation face challenges in effectively capturing the diverse information and styles present in biomedical datasets. This paper presents a comprehensive approach to address these challenges. Firstly, we analyze the structural differences and conversion between lay summaries and original text across different biomedical datasets. We then propose a novel method that integrates multiple diverse datasets, allowing the models to learn from a wide range of biomedical knowledge and topics. To overcome the varying domains and stylistic disparities, we introduce prompt strategies that enable the models to identify and adapt to the distinct characteristics of each dataset during summary generation. The effectiveness of our approach is demonstrated through participation in the biolaysumm biomedical lay summary competition, where our method achieves superior scores compared to other approaches. Our findings underscore the importance of dataset integration and prompt strategies in enhancing the accuracy and relevance of biomedical lay summary generation.
摘要 i
Abstract ii
Table of Contents iii
List of Tables vi
List of Figures viii
Chapter 1. Introduction 1
1.1 Background 1
1.2 BioLaySumm 2023 Dataset 1
1.2.1. Dataset Other Fields 3
1.2.2. The Example of Simplification Task 4
1.3 Motivation 5
1.4 Our Works 6
1.5 Our Contributions 7
Chapter 2. Related Work 8
2.1 Transformer 8
2.2 Longformer Encoder-Decoder (LED) 9
2.3 Bidirectional and AutoRegressive Transformer (BART) 11
2.4 Model Controllable Generation 12
2.4.1. A Conditional Transformer Language Model for Controllable Generation 12
2.4.2. Controllable Sentence Simplification 12
2.4.3. Latent Prompt Tuning for Text Summarization 14
2.4.4. BioLaySumm Competition Top 3 14
Chapter 3. Methodology 16
3.1 Architecture 16
3.2 First Components: Section Processing and Analysis 18
3.2.1. Preliminary Filtering and Analysis of both Datasets 18
3.2.2. Compress Section Combined Text 19
3.3 Second Components: Integrated training with prompt strategies 20
3.3.1. Training with Integrated Datasets 20
3.3.2. Prompt Strategies 20
Chapter 4. Experiment 23
4.1 Dataset Description 23
4.2 Evaluation Metric 24
4.2.1. Readability 24
4.3 Configuration Settings 25
4.4 Implement Details 26
4.4.1. Relevant Sections Selecting 26
4.4.2. Prompt Strategies 27
4.5 Comparative Analysis and Outperforming Established Approaches 28
4.5.1. Baselines 28
4.5.2. Evaluation Metrics Results 29
Chapter 5. Analysis 30
5.1 Integrated Training or Not 30
5.1.1. Integrated Training with Knowledge Sharing Analysis 31
5.2 Section Combined Text 34
5.2.1. Cosine Similarity 34
5.2.2. Fine-tuned LED on Section Combined Text 35
5.3 Prompt Study 37
5.3.1. Prompt Strategies 37
5.3.2. Dataset Discrete Prompt Analysis 40
5.3.3. Keywords List Prompt Analysis 41
5.3.4. Prompt Integration 43
5.4 Analyzing Data Discrete Prompts for Dataset Differentiation 44
5.4.1. Reverse Dataset Discrete Prompt 45
5.4.2. Text Style Classifier 46
5.5 Improving Shortcomings of Keywords List Prompt 47
Chapter 6. Discussion and Conclusion 49
References 50
[1] Iz Beltagy, Matthew E Peters, and Arman Cohan. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
[2] Jeanne Sternlicht Chall and Edgar Dale. Readability revisited: The new Dale-Chall readability formula. Brookline Books, 1995.
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pretraining of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[4] Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. BioLaySumm 2023 shared task: Lay summarisation of biomedical research articles. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 468–477, Toronto, Canada, July 2023. Association for Computational Linguistics.
[5] Tomas Goldsack, Zheheng Luo, Qianqian Xie, Carolina Scarton, Matthew Shardlow, Sophia Ananiadou, and Chenghua Lin. Overview of the biolaysumm 2023 shared task on lay summarization of biomedical research articles. In Proceedings of the 22st Workshop on Biomedical Language Processing, Toronto, Canada, 2023. Association for Computational Linguistics.
[6] Tomas Goldsack, Zhihao Zhang, Chenghua Lin, and Carolina Scarton. Making science simple: Corpora for the lay summarisation of scientific literature. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10589–10604, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
[7] Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong, and Richard Socher. CTRL - A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858, 2019.
[8] J Peter Kincaid, Robert P Fishburne Jr, Richard L Rogers, and Brad S Chissom. Derivation of new readability formulas (automated readability index, fog count and flesch reading ease formula) for navy enlisted personnel. 1975.
[9] Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online, July 2020. Association for Computational Linguistics. 50
[10] Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics.
[11] Zheheng Luo, Qianqian Xie, and Sophia Ananiadou. Readability controllable biomedical document summarization. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4667–4680, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics.
[12] Louis Martin, Éric de la Clergerie, Benoît Sagot, and Antoine Bordes. Controllable sentence simplification. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4689–4698, Marseille, France, May 2020. European Language Resources Association.
[13] Domenic Rosati. GRASUM at BioLaySumm task 1: Background knowledge grounding for readable, relevant, and factual biomedical lay summaries. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 483–490, Toronto, Canada, July 2023. Association for Computational Linguistics.
[14] Oisn Turbitt, Robert Bevan, and Mouhamad Aboshokor. MDC at BioLaySumm task 1: Evaluating GPT models for biomedical lay summarization. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 611–619, Toronto, Canada, July 2023. Association for Computational Linguistics.
[15] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
[16] Yu-Hsuan Wu, Ying-Jia Lin, and Hung-Yu Kao. IKM_Lab at BioLaySumm task 1: Longformer-based prompt tuning for biomedical lay summary generation. In The 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, pages 602–610, Toronto, Canada, July 2023. Association for Computational Linguistics.
[17] Hongyi Yuan, Zheng Yuan, Ruyi Gan, Jiaxing Zhang, Yutao Xie, and Sheng Yu. BioBART: Pretraining and evaluation of a biomedical generative language model. In Proceedings of the 21st Workshop on Biomedical Language Processing, pages 97–109, Dublin, Ireland, May 2022. Association for Computational Linguistics.
[18] Yubo Zhang, Xingxing Zhang, Xun Wang, Si-qing Chen, and Furu Wei. Latent prompt tuning for text summarization. arXiv preprint arXiv:2211.01837, 2022.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top