跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/08 21:46
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊仁豪
研究生(外文):Jen-Hao Yang
論文名稱:基於階層式聚類注意力之編碼解碼器於醫療問題多答案摘要
論文名稱(外文):Hierarchical Clustering with Attentions Based Encoder-Decoder for Multi-Answer Summarization of Medical Questions
指導教授:徐國鎧李龍豪李龍豪引用關係
學位類別:碩士
校院名稱:國立中央大學
系所名稱:電機工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:中文
論文頁數:91
中文關鍵詞:多文檔摘要抽象式摘要階層式聚類階層式注意力編碼器解碼器架構
外文關鍵詞:multi-document summarizationabstractive summarizationhierarchical clusteringhierarchical attentionencoder-decoder architecture
相關次數:
  • 被引用被引用:0
  • 點閱點閱:12
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著資訊爆炸挑戰日益嚴峻,人們對從多篇文檔中迅速獲取精簡資訊的需求日益增加,這在新聞總結、學術文獻和電影評論等領域得到了廣泛應用。因此,本研究專注於研究醫療問題的多答案摘要,旨在幫助民眾從大量相關的醫療答案中提煉出可行且可靠的回覆,以便他們能夠更容易地理解。我們提出了一種基於階層式聚類注意力機制 (Hierarchical Clustering with Attentions, HCA) 的多文檔摘要模型。該模型利用修辭結構理論提取每篇文檔的基本話語單元,並進行基於密度的階層式聚類,將這些基本話語單元分成多個群集。隨後,利用大型語言模型找出每個群集的主題,以增強模型對群集之間基本話語單元的理解。通過對群集進行重排序,模型能更好地捕捉聚類後群集間的差異性。最終,我們將不同群集的主題與基本話語單元整合成輸入嵌入向量,並輸入到階層式的編碼器-解碼器架構中,包含階層式自我注意力機制以及階層式交叉注意力機制,生成最終的醫療問題多答案摘要。
有鑒於在多文檔摘要領域缺乏公開的中文醫療領域評測資料集,本研究建置了一組2,077筆中文醫療問題的多答案摘要資料集Mednet-MAS,每筆資料包含多筆相關答案及人工標記的多答案摘要。英文實驗資料來自MEDIQA 2021國際競賽的MAS資料集,包含286筆英文醫療問題以及多答案的摘要。藉由實驗結果與模型分析得知,我們提出的HCA模型在醫療問題多答案摘要任務達到最好ROUGE和BERTScore分數,比相關研究模型 (BART、PEGASUS、CPT、T5、LongT5、HED) 等有更好的摘要效能,人工評估進一步驗證我們提出的HCA模型在多文檔摘要上有良好的表現。
This study focuses on a multi-answer summarization of medical questions, which will benefit the general public by obtaining reliable medical counseling suggestions from relevant answers returned by the question-answering systems. We propose a Hierarchical Clustering with Attention (HCA) model for medical multi-answer summarization. First, we use rhetorical structure theory to extract elementary discourse units and perform hierarchical clustering to group them. A large language model is then used to identify a topic for each cluster. Finally, we integrate discourse units with topic information in each cluster to feed into our proposed hierarchical encoder-decoder architecture, including hierarchical self-attention and cross-attentions to enhance the summary generation performance.
Due to the lack of publicly available datasets on Chinese medical multi-document summarization, we manually created a Mednet-MAS dataset, including 2,077 Chinese medical questions with multi-answer summaries, for model performance evaluation. We conducted the experiments on the public MEDIQA-MAS datasets in English and our constructed Mednet-MAS in Chinese. Experimental results indicate that our HCA model outperformed other models like BART, PEGASUS, CPT, T5, LongT5, and HED regarding ROUGE and BERTScore metrics. In addition, human evaluation on randomly selected samples also confirmed our HCA model performs better than the state-of-the-art HED model for medical multi-answer summarization.
摘要 i
ABSTRACT ii
致謝 iii
目錄 iv
圖目錄 vii
表目錄 viii
第一章 緒論 1
1-1 研究背景 1
1-2 研究動機 3
1-3 研究目的 4
1-4 章節概要 6
第二章 相關研究 7
2-1 摘要模型 7
2-1-1 基於神經網路的摘要模型 7
2-1-2 基於聚類的摘要模型 15
2-1-3 摘要模型總結 17
2-2 聚類演算法 19
2-3 多文檔摘要資料集 22
第三章 研究方法 25
3-1 模型架構 25
3-2 階層式聚類 26
3-2-1 基本話語單元的提取 27
3-2-2 聚類演算法 28
3-2-3 群集主題辨識 32
3-2-4 輸入嵌入向量的方式 34
3-3 階層式編碼器-解碼器架構 36
3-3-1 編碼器-解碼器 36
3-3-2 階層式自我注意力機制 38
3-3-3 階層式交叉注意力機制 39
3-3-4 損失函數 41
第四章 實驗與效能評估 42
4-1 資料集建置 42
4-2 效能指標 46
4-3 實驗設定 51
4-4 模型效能比較 53
4-5 消融實驗 57
4-6 人工評估 59
第五章 研究結論 65
5-1 結論 65
5-2 研究限制 66
5-3 未來工作 67
參考文獻 68
附錄 76
附錄一、LSARS資料統計 76
附錄二、LSARS多文檔摘要模型效能比較 77
[1] Z. Cao, F. Wei, L. Dong, S. Li, and M. Zhou, ‘Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization’, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29, no. 1, Feb. 2015.
[2] X. Zheng, A. Sun, J. Li, and K. Muthuswamy, ‘Subtopic-driven Multi-Document Summarization’, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 3153–3162.
[3] S. Hochreiter and J. Schmidhuber, ‘Long Short-Term Memory’, Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[4] J. Chung, Ç. Gülçehre, K. Cho, and Y. Bengio, ‘Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling’, CoRR, vol. abs/1412.3555, 2014.
[5] Z. Huang, W. Xu, and K. Yu, ‘Bidirectional LSTM-CRF Models for Sequence Tagging’, CoRR, vol. abs/1508.01991, 2015.
[6] A. Brazinskas, M. Lapata, and I. Titov, ‘Unsupervised Multi-Document Opinion Summarization as Copycat-Review Generation’, CoRR, vol. abs/1911.02247, 2019.
[7] E. Chu and P. J. Liu, ‘MeanSum: A Neural Model for Unsupervised Multi-document Abstractive Summarization’, 2019, arXiv:1810.05739.
[8] M. Coavoux, H. Elsahar, and M. Gallé, ‘Unsupervised Aspect-Based Multi-Document Abstractive Summarization’, in Proceedings of the 2nd Workshop on New Frontiers in Summarization, 2019, pp. 42–47.
[9] I. Sutskever, O. Vinyals, and Q. V. Le, ‘Sequence to Sequence Learning with Neural Networks’, CoRR, vol. abs/1409.3215, 2014.
[10] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gu̇lçehre, and B. Xiang, ‘Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond’, in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 280–290.
[11] O. Vinyals, M. Fortunato, and N. Jaitly, ‘Pointer Networks’, 2015, arXiv:1506.03134.
[12] J. Gu, Z. Lu, H. Li, and V. O. K. Li, ‘Incorporating Copying Mechanism in Sequence-to-Sequence Learning’, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 1631–1640.
[13] Y. Miao and P. Blunsom, ‘Language as a Latent Variable: Discrete Generative Models for Sentence Compression’, in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2016, pp. 319–328.
[14] Z. Tu, Z. Lu, Y. Liu, X. Liu, and H. Li, ‘Modeling Coverage for Neural Machine Translation’, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2016, pp. 76–85.
[15] A. See, P. J. Liu, and C. D. Manning, ‘Get To The Point: Summarization with Pointer-Generator Networks’, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2017, pp. 1073–1083.
[16] R. Nallapati, B. Zhou, C. dos Santos, Ç. Gu̇lçehre, and B. Xiang, ‘Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond’, in Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, 2016, pp. 280–290.
[17] J. Carbonell and J. Goldstein, ‘The use of MMR, diversity-based reranking for reordering documents and producing summaries’, in Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Melbourne, Australia, 1998, pp. 335–336.
[18] A. Fabbri, I. Li, T. She, S. Li, and D. Radev, ‘Multi-News: A Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model’, in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 2019, pp. 1074–1084.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, ‘Attention Is All You Need’, CoRR, vol. abs/1706.03762, 2017.
[20] I. Beltagy, M. E. Peters, and A. Cohan, ‘Longformer: The Long-Document Transformer’, 2020, arXiv:2004.05150.
[21] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, ‘BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7871–7880.
[22] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu, ‘Exploring the limits of transfer learning with a unified text-to-text transformer’, J. Mach. Learn. Res., vol. 21, no. 1, Jan. 2020.
[23] J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, ‘PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization’, 2020, arXiv:1912.08777.
[24] W. Xiao, I. Beltagy, G. Carenini, and A. Cohan, ‘PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization’, 2022, arXiv:2110.08499.
[25] R. Wolhandler, A. Cattan, O. Ernst, and I. Dagan, ‘How "Multi" is Multi-Document Summarization?’, in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022, pp. 5761–5769.
[26] C. Shen, L. Cheng, X.-P. Nguyen, Y. You, and L. Bing, ‘A Hierarchical Encoding-Decoding Scheme for Abstractive Multi-document Summarization’, 2023, arXiv:2305.08503.
[27] X. Wan and J. Yang, ‘Multi-document summarization using cluster-based link analysis’, in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, Singapore, 2008, pp. 299–306.
[28] Y. Zhang, Y. Xia, Y. Liu, and W. Wang, ‘Clustering Sentences with Density Peaks for Multi-document Summarization’, 01 2015, pp. 1262–1267.
[29] M. T. Nayeem, T. A. Fuad, and Y. Chali, ‘Abstractive Unsupervised Multi-Document Summarization using Paraphrastic Sentence Fusion’, in Proceedings of the 27th International Conference on Computational Linguistics, 2018, pp. 1191–1204.
[30] Z. Liu and N. Chen, ‘Exploiting Discourse-Level Segmentation for Extractive Summarization’, in Proceedings of the 2nd Workshop on New Frontiers in Summarization, 2019, pp. 116–121.
[31] J. Xu, Z. Gan, Y. Cheng, and J. Liu, ‘Discourse-Aware Neural Extractive Text Summarization’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5021–5031.
[32] W. C. Mann, and S. A. Thompson, ‘Rhetorical Structure Theory: Toward a functional theory of text organization’, Text & Talk, vol. 8, pp. 243–281, 1988.
[33] O. Ernst, A. Caciularu, O. Shapira, R. Pasunuru, M. Bansal, J. Goldberger, and I. Dagan, ‘Proposition-Level Clustering for Multi-Document Summarization’, in Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 1765–1779.
[34] G. Stanovsky, J. Michael, L. Zettlemoyer, and I. Dagan, ‘Supervised Open Information Extraction’, in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018, pp. 885–895.
[35] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, ‘Generating Wikipedia by Summarizing Long Sequences’, 2018, arXiv:1801.10198.
[36] D. Gholipour Ghalandari, C. Hokamp, N. T. Pham, J. Glover, and G. Ifrim, ‘A Large-Scale Multi-Document Summarization Dataset from the Wikipedia Current Events Portal’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 1302–1308.
[37] Y. Lu, Y. Dong, and L. Charlin, ‘Multi-XScience: A Large-scale Dataset for Extreme Multi-document Summarization of Scientific Articles’, in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020, pp. 8068–8074.
[38] M. Yasunaga, J. Kasai, R. Zhang, A. R. Fabbri, I. Li, D. Friedman, and D. R. Radev, ‘ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks’, 2019, arXiv:1909.01716.
[39] J. DeYoung, I. Beltagy, M. van Zuylen, B. Kuehl, and L. L. Wang, ‘MS\^2: Multi-Document Summarization of Medical Studies’, in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 7494–7513.
[40] L. Wang and W. Ling, ‘Neural Network-Based Abstract Generation for Opinions and Arguments’, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2016, pp. 47–57.
[41] S. Angelidis and M. Lapata, ‘Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised’, 2018, arXiv:1808.08858.
[42] H. Pan, R. Yang, X. Zhou, R. Wang, D. Cai, and X. Liu, ‘Large Scale Abstractive Multi-Review Summarization (LSARS) via Aspect Alignment’, in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, China, 2020, pp. 2337–2346.
[43] R. J. G. B. Campello, D. Moulavi, and J. Sander, ‘Density-based clustering based on hierarchical density estimates’, in Advances in Knowledge Discovery and Data Mining - 17th Pacific-Asia Conference, PAKDD 2013, Proceedings, 2013, vol. PART 2, pp. 160–172.
[44] S.-S. Hung, H.-H. Huang, and H.-H. Chen, ‘A Complete Shift-Reduce Chinese Discourse Parser with Robust Dynamic Oracle’, in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 133–138.
[45] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’, 2019, arXiv:1810.04805.
[46] J. D. Lafferty, A. McCallum, and F. C. N. Pereira, ‘Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data’, in Proceedings of the Eighteenth International Conference on Machine Learning, 2001, pp. 282–289.
[47] X. Jin and J. Han, ‘K-Means Clustering’, in Encyclopedia of Machine Learning, C. Sammut and G. I. Webb, Eds. Boston, MA: Springer US, 2010, pp. 563–564.
[48] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, ‘A density-based algorithm for discovering clusters in large spatial databases with noise’, in Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231.
[49] M. Ankerst, M. M. Breunig, H.-P. Kriegel, and J. Sander, ‘OPTICS: ordering points to identify the clustering structure’, SIGMOD Rec., vol. 28, no. 2, pp. 49–60, Jun. 1999.
[50] R. C. Prim, ‘Shortest connection networks and some generalizations’, The Bell System Technical Journal, vol. 36, no. 6, pp. 1389–1401, 1957.
[51] Y. Shao, Z. Geng, Y. Liu, J. Dai, H. Yan, F. Yang, L. Zhe, H. Bao, and X. Qiu, ‘CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation’, Sci. China Inf. Sci., vol. 67, 2021.
[52] C.-Y. Lin, ‘ROUGE: A Package for Automatic Evaluation of Summaries’, in Text Summarization Branches Out, 2004, pp. 74–81.
[53] Z. Li, X. Zhang, Y. Zhang, D. Long, P. Xie, and M. Zhang, ‘Towards General Text Embeddings with Multi-stage Contrastive Learning’, 2023, arXiv:2308.03281.
[54] R. Y. Pang, A. Lelkes, V. Tran, and C. Yu, ‘AgreeSum: Agreement-Oriented Multi-Document Summarization’, in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, 2021, pp. 3377–3391.
[55] R. S. Puduppully, P. Jain, N. Chen, and M. Steedman, ‘Multi-Document Summarization with Centroid-Based Pretraining’, in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023, pp. 128–138.
[56] J. Giorgi, L. Soldaini, B. Wang, G. Bader, K. Lo, L. L. Wang, and A. Cohan, ‘Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval’, in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 8177–8199.
[57] M. Guo, J. Ainslie, D. Uthus, S. Ontanon, J. Ni, Y.-H. Sung, and Y. Yang, ‘LongT5: Efficient Text-To-Text Transformer for Long Sequences’, in Findings of the Association for Computational Linguistics: NAACL 2022, 2022, pp. 724–736.
[58] Z. Zhang, H. Zhang, K. Chen, Y. Guo, J. Hua, Y. Wang, and M. Zhou, ‘Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese’, 2021, arXiv:2110.06696.
[59] D. Uthus, S. Ontanon, J. Ainslie, and M. Guo, ‘mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences’, in Findings of the Association for Computational Linguistics: EMNLP 2023, 2023, pp. 9380–9386.
[60] M. Savery, A. B. Abacha, S. Gayen, and D. Demner-Fushman, ‘Question-driven summarization of answers to consumer health questions’, Scientific Data, vol. 7, no. 1, p. 322, Oct. 2020.
[61] A. Ben Abacha, Y. Mrabet, Y. Zhang, C. Shivade, C. Langlotz, and D. Demner-Fushman, ‘Overview of the MEDIQA 2021 Shared Task on Summarization in the Medical Domain’, in Proceedings of the 20th Workshop on Biomedical Language Processing, 2021, pp. 74–85.
[62] Z. Zhao, H. Chen, J. Zhang, X. Zhao, T. Liu, W. Lu, X. Chen, H. Deng, Q. Ju, and X. Du, ‘UER: An Open-Source Toolkit for Pre-training Models’, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, 2019, pp. 241–246.
[63] C. Ma, W. E. Zhang, M. Guo, H. Wang, and Q. Z. Sheng, ‘Multi-document Summarization via Deep Learning Techniques: A Survey’, ACM Comput. Surv., vol. 55, no. 5, Dec. 2022.
[64] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, ‘BERTScore: Evaluating Text Generation with BERT’, 2020, arXiv:1904.09675.
[65] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, ‘RoBERTa: A Robustly Optimized BERT Pretraining Approach’, 2019, arXiv:1907.11692.
[66] J. Li, A. Sun, and S. Joty, ‘SegBot: A Generic Neural Text Segmentation Model with Pointer Network’, in Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18, 7 2018, pp. 4166–4172.
電子全文 電子全文(網際網路公開日期:20290723)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top