論文名稱(外文):Summarization of Multiple Attraction Reviews Based on Vector Clustering Method
There are many websites about attraction reviews, and how to choose useful reviews has become a hot topic. The summarization of multiple attraction reviews is to organize and summarize the contents of reviews related to a tourist attraction. The goal is to give readers a quick overview of what they want to know, saving time and being able to see at a glance. This thesis analyzes the review articles based on the principle of extractive summarization, using LDA, Word2vec, Doc2vec and other techniques to analyze multiple reviews and convert them into document vectors. In addition to categorizing positive and negative reviews by the ratings given by the reviewer, document vectors are grouped by K-means clustering method or topic model, and then the proposed method selects representative positive and negative review articles. This thesis uses questionnaire and statistical analysis methods to do manual evaluation. The experimental results show that the proposed method can accurately find a representative summary from multiple attraction reviews.
摘要 i
Abstract ii
目錄 iii
圖目錄 vi
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 1
1.3 論文架構 2
第二章 文獻探討 3
2.1 自動摘要 3
2.1.1 抽取式摘要(Extractive summarization) 4
2.1.2 抽象式摘要(Abstractive summarization) 5
2.2 自動摘要文件數 6
第三章 背景知識 7
3.1 Word2vec 7
3.2 Doc2vec 10
3.3 LDA 12
3.4 K-means 13
3.5 自動摘要評估 14
3.5.1 內部一致性 14
3.5.2 評分者信度 15
第四章 研究方法 17
4.1 問題定義 17
4.2 方法描述 17
4.3 蒐集景點評論資料 20
4.4 資料前處理 20
4.5 產生向量表示法並挑選出評論 21
4.5.1 LDA表示法 22
4.5.2 Doc2vec+K-means表示法 25
4.5.3 Word2vec+K-means表示法 25
4.5.4 隨機表示法 26
4.5.5 重複隨機表示法 27
4.6 自動摘要成效評估 28
第五章 實驗結果 29
5.1 實驗設定 29
5.1.1 資料集 29
5.1.2 環境設定 31
5.2 實驗方法 31
5.2.1 決定分群數 31
5.2.2 LDA實驗 32
5.2.3 向量表示法實驗 33
5.3 實驗成果與實驗評估 35
5.3.1 實驗成果 35
5.3.2 實驗成效評估 37
5.3.3 人工評估表信度分析 37
5.3.4 各實驗方法分群之成效比較 41
5.3.5 各實驗方法之成效評估 43
5.4 實驗討論 48
第六章 結論與未來展望 50
