跳到主要內容

臺灣博碩士論文加值系統

(98.80.143.34) 您好!臺灣時間:2024/10/04 17:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:黃淯晴
論文名稱:基於向量分群的多篇景點評論摘要方法
論文名稱(外文):Summarization of Multiple Attraction Reviews Based on Vector Clustering Method
指導教授:陳耀輝陳耀輝引用關係
學位類別:碩士
校院名稱:國立嘉義大學
系所名稱:資訊工程學系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:108
語文別:中文
論文頁數:54
中文關鍵詞:景點評論抽取式摘要LDADoc2vecWord2vecK-means人工評估
相關次數:
  • 被引用被引用:1
  • 點閱點閱:238
  • 評分評分:
  • 下載下載:47
  • 收藏至我的研究室書目清單書目收藏:0
現在有許多關於旅遊評論的網站,如何挑選有用的景點評論也成為一個熱門題目。景點評論摘要是針對多篇與景點有關係的評論,將其內容整理並且摘要出來,而最終目的就是讓讀者能夠快速的了解想知道的內容,既節省時間又能夠一目了然。本論文將所收集到的評論文章做內容分析,依據抽取式摘要的原理,使用LDA、Word2vec及Doc2vec等技術將多篇評論文章轉換成文章向量。除了按評論者所給的評分做正面評論和負面評論分類,還使用K-means分群技術或主題模型把所有評論分群,最後挑出具代表性的正負評論文章。本論文採用問卷及統計分析等方法做人工評估,由實驗結果顯示,本論文所提出來的方法可以有效地找出具代表性的景點摘要。
There are many websites about attraction reviews, and how to choose useful reviews has become a hot topic. The summarization of multiple attraction reviews is to organize and summarize the contents of reviews related to a tourist attraction. The goal is to give readers a quick overview of what they want to know, saving time and being able to see at a glance. This thesis analyzes the review articles based on the principle of extractive summarization, using LDA, Word2vec, Doc2vec and other techniques to analyze multiple reviews and convert them into document vectors. In addition to categorizing positive and negative reviews by the ratings given by the reviewer, document vectors are grouped by K-means clustering method or topic model, and then the proposed method selects representative positive and negative review articles. This thesis uses questionnaire and statistical analysis methods to do manual evaluation. The experimental results show that the proposed method can accurately find a representative summary from multiple attraction reviews.
摘要 i
Abstract ii
目錄 iii
圖目錄 vi
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 論文架構 2
第二章 文獻探討 3
2.1 自動摘要 3
2.1.1 抽取式摘要(Extractive summarization) 4
2.1.2 抽象式摘要(Abstractive summarization) 5
2.2 自動摘要文件數 6
第三章 背景知識 7
3.1 Word2vec 7
3.2 Doc2vec 10
3.3 LDA 12
3.4 K-means 13
3.5 自動摘要評估 14
3.5.1 內部一致性 14
3.5.2 評分者信度 15
第四章 研究方法 17
4.1 問題定義 17
4.2 方法描述 17
4.3 蒐集景點評論資料 20
4.4 資料前處理 20
4.5 產生向量表示法並挑選出評論 21
4.5.1 LDA表示法 22
4.5.2 Doc2vec+K-means表示法 25
4.5.3 Word2vec+K-means表示法 25
4.5.4 隨機表示法 26
4.5.5 重複隨機表示法 27
4.6 自動摘要成效評估 28
第五章 實驗結果 29
5.1 實驗設定 29
5.1.1 資料集 29
5.1.2 環境設定 31
5.2 實驗方法 31
5.2.1 決定分群數 31
5.2.2 LDA實驗 32
5.2.3 向量表示法實驗 33
5.3 實驗成果與實驗評估 35
5.3.1 實驗成果 35
5.3.2 實驗成效評估 37
5.3.3 人工評估表信度分析 37
5.3.4 各實驗方法分群之成效比較 41
5.3.5 各實驗方法之成效評估 43
5.4 實驗討論 48
第六章 結論與未來展望 50
[1] 林江豪,周咏梅,陽愛民,王偉,结合詞向量和聚類算法的新闻評論話题演進分析,計算機工程與科學,2016。
[2] 黃仁鵬,張貞瑩,運用詞彙權重技術於自動文件摘要之研究,中華民國資訊管理學報,21(4),2014。
[3] 楊宗義,教育統計學[M],科學技術文獻出版社,1990。
[4] 賈曉婷,王名揚,曹宇,结合Doc2Vec与改进聚类算法的中文单文档自动摘要方法研究,數據分析與知識發現,2018。
[5] 劉政璋,以概念分群為基礎之新聞事件自動摘要,國立交通大學資訊科學研究所碩士論文,2005。
[6] 劉娜,路瑩,唐曉君,李明霞,基于LDA重要主题的多文檔自動摘要算法,計算機科學與探索,2015。
[7] 魏玲玉,曾守正,以文件倉儲概念實現動態群聚與多重文件摘要之研究-以中文電子新聞為例,資訊管理學報,13(3),2006。
[8] M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text Summarization Techniques: A Brief Survey," International Journal of Advanced Computer Science and Applications, vol. 8, no. 10, 2017.
[9] D. M. Blei, A. Y. Ng, and M. I. Jordan. "Latent Dirichlet Allocation," Journal of Machine Learning Research, vol. 3, pp. 933-1022, 2003.
[10] G. Chao, S. Sun, and J. Bi, "A Survey on Multi-view Clustering," arXiv:1712.06246, 2017.
[11] S. Chopra, M. Auli, and A. M. Rush, "Abstractive Sentence Summarization with Attentive Recurrent Neural Networks," Proceedings of NAACL-HLT 2016, pp. 93-98, 2016.
[12] D. Das and A. F. T. Martins, "A Survey on Automatic Text Summarization," Literature Survey for the Language and Statistics II course at CMU, 2007.
[13] J. P. Guilford, Fundamental Statistics in Psychology and Education. New York: Mcgraw-Hill, 1965.
[14] R. Likert, A Technique for the Measurement of Attitudes, New York: The Science Press, vol. 22, no. 140, pp.5-55, 1932.
[15] K. W. Lim and W. Buntine. "Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon," Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 2014.
[16] J. Liu, X. Ren, J. Shang, T. Cassidy, C. R. Voss, and J. Han, "Representing Documents via Latent Keyphrase Inference," Proceedings of the 25th International Conference on World Wide Web, 2016.
[17] P. J. Liu, M. Saleh, E. Pot, B. Goodrich, R. Sepassi, L. Kaiser, and N. Shazeer, "Generating Wikipedia by Summarizing Long Sequences," arXiv:1801.10198, 2018.
[18] H. P. Luhn, "The Automatic Creation of Literature Abstracts," IBM Journal of Research and Development, vol. 2, no. 2, pp. 159–165, 1958.
[19] G. Mesnil, T. Mikolov, M. Ranzato, and Y. Bengio, "Ensemble of Generative and Discriminative Techniques for Sentiment Analysis of Movie Reviews," arXiv:1412.5335, 2014.
[20] A. M. Rush, S. Chopra, and J. Weston, "A Neural Attention Model for Sentence Summarization," Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 379-389, 2015.
[21] J. MacQueen, "Some Methods for Classification and Analysis of Multivariate Observations," Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 218-297, 1967.
[22] T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient Estimation of Word Representations in Vector Space," arXiv:1301.3781v3, 2013.
[23] T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed Representations of Words and Phrases and Their Compositionality," Advances in Neural Information Processing Systems (NIPS 2013), 2013.
[24] T. Mikolov, "Distributed Representations of Sentences and Documents," Proceedings of the 31st International Conference on Machine Learning, 2014.
[25] J. C. Nunnally and J. H. Bernstein, Psychometric Theory, New York: McGraw Hill, vol. 19, no. 3, pp.303-305, 1994.
[26] K. D. Onal, Y. Zhang, I. S. Altingovde, M. M. Rahman, P. Karagoz, and A. Braylan, "Neural Information Retrieval: At the End of the Early Years," Information Retrieval Journal 21, pp. 111-182, 2018.
[27] L. Page, S. Brin, R. Motwani, and T. Winograd, "The PageRank Citation Ranking: Bringing Order to the Web," Stanford InfoLab, 1999.
[28] J. Pennington, R. Socher, and C. D. Manning, "Glove: Global Vectors for Word Representation," Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
[29] J. L. Rodgers and W. A. Nicewander, “Thirteen Ways to Look at the Correlation Coefficient,” The American Statistician, vol. 42, no. 1, p. 59, 1988.
[30] A. See, P. J. Liu, and C. D. Manning. "Get to the Point: Summarization with Pointer-Generator Networks," arXiv:1704.04368, 2017.
[31] J. Tang, M. Qu, and Q. Mei, "PTE: Predictive Text Embedding through Large-Scale Heterogeneous Text Networks," Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015.
[32] Y. S. Wang and H. Y. Lee, "Learning to Encode Text as Human-Readable Summaries using Generative Adversarial Networks," arXiv:1810.02851, 2018.
[33] Summarization, "Taming Recurrent Neural Networks for Better Summarization," http://www.abigailsee.com/2017/04/16/taming-rnns-for-better-summarization.html.
[34] Beautifulsoup4, https://www.crummy.com/software/BeautifulSoup/bs4/doc/.
[35] Scikit-learn: Machine learning in Python. https://scikit-learn.org/stable/.
[36] SPSS, https://www.ibm.com/tw-zh/analytics/spss-statistics-software.
[37] TextTeaser, https://github.com/IndigoResearch/textteaser.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top