跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.23) 您好!臺灣時間:2025/10/26 22:23
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:高虹安
研究生(外文):Hung-An Kao
論文名稱:部落格貼文評論擷取及其在意見探勘上的應用
論文名稱(外文):Comment Extraction from Blog Posts and Its Application to Opinion Mining
指導教授:陳信希陳信希引用關係
指導教授(外文):Hsin-Hsi Chen
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2008
畢業學年度:96
語文別:英文
論文頁數:111
中文關鍵詞:部落格評論擷取意見探勘部落格探勘特徵選取
外文關鍵詞:BlogComment ExtractionOpinion MiningBlog MiningFeature Selection
相關次數:
  • 被引用被引用:2
  • 點閱點閱:7239
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:3
近年來,由於部落格數量以及部落客人數大幅增加,著實改變了網路上人與人之間溝通的方式。舉例來說,部落客可以將推薦的部落格加入部落格連結,而產生了部落格和部落格之間的關係;每個部落客在發表貼文時,可以引用其他部落格貼文,而產生貼文與貼文之間的關係。此外,部落格貼文和一般網頁最大的不同點在於讀者在瀏覽部落格貼文後,可以藉由留下評論的方式和作者(亦即部落客)互動。
由於部落格貼文往往包含了許多作者的個人經驗、對某些特定事物的看法等,在意見探勘研究上是相當有用的素材。其中,評論不僅代表了讀者們對貼文中所描述對象的觀感,更可能是對這篇文章支持亦或反對的態度。然而,過去的研究往往只針對如何找尋部落格貼文中作者的意見,而把隱藏在評論中的讀者意見給忽略了。我們推斷原因,可能是評論擷取的確有不少挑戰性。每個部落格站都提供了不同的樣板來呈現評論,並沒有一定的規格,甚至在一個部落格站中也會有不同的樣板。樣版不同,代表評論和評論之間會有極大的差異性。如何在各式各樣來源的部落格貼文中,都可以正確切分出一個一個的評論,顯然並非一個簡單的問題。
在本篇論文中,我們分析部落格貼文中評論的特性,提出一些方法來解決評論擷取的問題,並實作一個部落格意見搜尋系統,將評論擷取的結果應用到部落格意見探勘上。最後總結在評論擷取的實驗結果與應用於意見探勘上的成果,並提出許多有趣的議題亦或方向供未來研究。
In recent years, the style of communications on the Internet is changed due to the growing amount of blogs and bloggers. For example, bloggers may put their commendatory blogs into the blog roll in their blogs, and the relationship between blogs and blogs thus is formed. When a blogger writes a post, he or she may cite other blog posts. This establishes the relationship among blog posts and bog posts. Besides, one of the most differences between blogs and standard web pages is that blogs allow readers to interact with bloggers by placing comments on specific blog posts.
Because blog posts usually contain many personal experiences or perspectives toward specific subjects, they are useful materials for opinion mining. Moreover, the comments in a blog post carry the viewpoints of readers toward the targets described in the post or the supportive or nonsupportive attitude toward the post. However, the previous works on opinion mining focus on author’s opinion only. In other words, mining opinions of readers in the comment region is largely ignored. The reason may be the challenges of comment extraction. Each blog service provider provides its unique templates to present comments. A specification of templates among all blog service providers does not exist. Even a blog service provider may have different templates. How to correctly extract each comment from blog posts of different sources is apparently not an easy task.
In this thesis, we analyze comments in blog posts, propose methods to deal with comment extraction, and apply our comment extraction results to blog opinion mining. Finally, we conclude with the experimental results of comment extraction and the achievements in blog opinion mining, and state some interesting issues for future work.
口試委員會審定書 ii
中文摘要 iii
Abstract iv
Chapter 1 Introduction 1
1.1 Blogosphere 1
1.2 Motivation 4
1.3 What Would Like to Do 11
1.4 Organization of the Thesis 13
Chapter 2 Related Works 14
2.1 Comment Applications 14
2.2 Content Extraction 15
2.3 Opinion Mining 16
Chapter 3 Comment Extraction 18
3.1 Problem Definition 18
3.2 Encoding 21
3.3 Repetitive Pattern Identification 25
3.4 Filtering 27
3.4.1 Method 1: Tag Pattern Loop Filtering 28
3.4.2 Method 2: Rule Overlap Filtering 30
3.4.3 Method 3: Longest Rule First 36
3.5 Comment/Non-comment Classifier 39
3.5.1 Feature Extraction 39
3.5.1.1 Block-Level Features 39
3.5.1.2 Rule-Level Features 41
3.5.2 Classifier Learning 43
Chapter 4 Experiments and Discussion 46
4.1 Experiment Setup 46
4.2 Answer Annotation 47
4.3 Corpus Analysis 50
4.4 Evaluation 54
4.4.1 Evaluation Metric 54
4.4.2 Comparison of Filtering Methods 55
4.4.3 Experiments on Feature Selection 59
4.4.3.1 Removing One Feature at a Time 60
4.4.3.2 Feature-Score for Feature Selection 62
4.4.3.3 Consideration of Feature Selection in Cross Validation 68
4.5 Error Analysis 91
4.5.1 Blog Posts with One Comment 91
4.5.2 Different Comment Styles 92
Chapter 5 An Application to Opinion Mining 95
5.1 Opinion Mining 95
5.1.1 Relevant Opinionated Objects in Blogs 96
5.1.2 Opinionated Targets 97
5.1.2.1 Post Content Extraction 97
5.1.2.2 Experiments Setup 99
5.1.2.3 Results 100
5.1.3 Relevant/Irrelevant Identification 101
5.2 Visualization 102
Chapter 6 Conclusion and Future Work 105
6.1 Conclusion 105
6.2 Future Work 106
References 108
Noor F. Ali-Hasan and Lada A. Adamic. (2007). “Expressing Social Relationships on the Blog through Links and Comments.” Proceedings of the International Conference on Weblogs and Social Media.
Dong-Lin Cao, Xiang-Wen Liao, Hong-Bo Xu and Shuo Bai. (2008). “Blog Post and Comment Extraction Using Information Quantity of Web Format.” Proceedings of 2008 international conference on Asia Information Retrieval Symposium.
Deepayan Chakrabarti, Ravi Kumar, and Kunal Punera (2007). “Page-level template detection via isotonic smoothing.” Proceedings of the 16th International Conference on World Wide Web (WWW’07), 61–70.
Chia-Hui Chang and Shao-Chen Lui. (2001). “IEPAD: Information Extraction Based on Pattern Discovery.” Proceedings of the 10th international conference on World Wide Web, 681-688.
Chih-Chung Chang and Chih-Jen Lin. (2001). “{LIBSVM}: a library for support vector machines.” Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Yi-Wei Chen and Chih-Jen Lin. (2005). “Combining SVMs with various feature selection strategies.” Available from http://www.csie.ntu.edu.tw/~cjlin/ papers/features.pdf
Jack G. Conrad and Frank Schilder. (2008). “Opinion mining in legal blogs.” Proceedings of the 11th international conference on Artificial intelligence and law, 231-236.
Xiaowen Ding, Bing Liu and Philip S. Yu. (2008). “A holistic lexicon-based approach to opinion mining.” Proceedings of the international conference on Web search and web data mining, Palo Alto, California, USA, 231-240.
Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, and Ryoji Kataoka. (2007). “Opinion Sentence Search Engine on Open-domain Blog,” Proceedings of the 20th Int’l Joint Conf. of Artificial Intelligence. Hyderabad: IJCAI Press, 2760-2765.
Hua Geng, Qiang Gao and Jingui Pan. (2007). “Extracting Content for News Web Pages based on DOM.” IJCSNS International Journal of Computer Science and Network Security, VOL.7 No.2, 124-129.
Meishan Hu, Aixin Sun and Ee-Peng Lim. (2007). “Comments-Oriented Blog Summarization by Sentence Extraction.” Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, 901-904.
Suhit Gupta, Gail E. Kaiser, Peter Grimm, Michael F. Chiang, and Justin Starren. (2005). “Automating Content Extraction of HTML Documents.” Journal of the World Wide Web, 8 (2), 179-224.
Lun-Wei Ku and Hsin-Hsi Chen (2007). “Mining Opinions from the Web: Beyond Relevance Retrieval.” Journal of American Society for Information Science and Technology, Special Issue on Mining Web Resources for Enhancing Information Retrieval, 58(12), 1838-1850.
Lun-Wei Ku, Tung-Ho Wu, Li-Ying Lee, and Hsin-Hsi Chen (2005) “Construction of an Evaluation Corpus for Opinion Extraction.” Proceedings of the 5th NTCIR Workshop Meeting, December 6-9, 2005, Tokyo, Japan, 513-520.
Dongjoo Lee, Ok-Ran Jeong and Sang-goo Lee. (2008). “Opinion mining of customer feedback data on the web.” Proceedings of the 2nd international conference on Ubiquitous information management and communication, 230-235.
Yang Liu, Xiangji Huang, Aijun An and Xiaohui Yu. (2007). “ARSA: A Sentiment-Aware Model for Predicting Sales Performance Using Blogs.” Proceedings of SIGIR’07, July 23–27, 2007, Amsterdam, The Netherlands, 607-614.
Gilad Mishne. (2007). “Using Blog Properties to Improve Retrieval.” Proceedings of the International Conference on Weblogs and Social Media, 247-250.
Gilad Mishne and Natalie Glance (2006). “Leave a Reply: An Analysis of Weblog Comments.” Proceedings of WWW’06 Workshop on the Weblogging Ecosystem.
Cai, D., Yu, S., Wen, J.-R. and Ma, W.-Y. (2003). " VIPS: a visionbased page segmentation algorithm." Microsoft Technical Re-port, MSR-TR-2003-79.
Chun-Yuan Teng and Hsin-Hsi Chen (2008). "Event Detection and Summarization in Weblogs with Temporal Collocations." Proceedings of the Sixth International Conference on Language Resources and Evaluation, May 26-June 1, 2008, Marrakech, Morocco.
Lan Yi and Bin Liu. (2003). “Web Page Cleaning for Web Mining through Feature Weighting.” Proceedings of Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), Acapulco, Mexico, August, 2003, 9-15.
Wei Zhang, Clement Yu and Weiyi Meng. (2007). “Opinion retrieval from blogs.” Proceedings of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal, 831-840.
Cai-Nicolas Ziegler and Michal Skubacz. (2007). “Content Extraction from News Pages Using Particle Swarm Optimization on Linguistic and Structural Features.” Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence, 242-249.
楊昌樺, 高虹安, 陳信希 (2007)。 “以部落格語料進行情緒趨勢分析.” 第十九屆自然語言與語音處理研討會論文集,2007年九月6-7日,台灣,台北,205-218。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top