跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.172) 您好!臺灣時間:2025/09/10 06:33
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:蔡孟芳
研究生(外文):Meng-Fang Tsai
論文名稱:利用查詢關鍵詞取代全文檢索關鍵詞之系統化研究
論文名稱(外文):A Systematic Study of Queried Keywords vs. Full-text Extracted Keywords in Blog Mining
指導教授:呂瑞麟呂瑞麟引用關係
指導教授(外文):Eric Jui-Lin Lu
口試委員:周世玉陳宜惠
口試日期:2012-07-25
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊管理學系所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:65
中文關鍵詞:部落格網路部落格探勘
外文關鍵詞:Blog NetworkBlog Mining
相關次數:
  • 被引用被引用:1
  • 點閱點閱:367
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:0
近年來,越來越多值得參考且可信度高的資訊被分享於部落格,亦即讀者已漸習慣仰賴部落格的資訊。部落格資訊量的驟增,而分群則是取得相關資訊的有效率的方式。綜觀此研究領域,其為透過連結資訊、關鍵字或標籤/分類來計算相似度,其為分群的必要程序。keywords的選取方式多為採用全文關鍵字選取程序,但該程序若欲處理大量的文章的情況下是即為耗時的;而標籤/分類存在著固有的模糊問題,不同bloggers對相同內容定義不同標籤/分類。而本論文則提出co-keywords,其為透過部落格讀者輸入關鍵字至搜尋引擎,取得相關的部落格資訊是為最普遍之方法。又關鍵字則是透過部落格讀者的觀點反應符合文章主題的重要橋樑。故此,關鍵字若能精確的表達文章的主題,與此同時也符合使用者的意圖。在本篇研究,則提出一個利用內嵌於「Blog Connect」的追蹤碼,透過該追蹤碼蒐集檢索關鍵字,亦即部落格讀者透過搜尋引擎所輸入的檢索關鍵字,再透過不同的變數的計算選取關鍵字。實驗結果證明其能有效取代耗時的全文關鍵字選取程序同時符合部落格讀者意圖的資訊接收之目的。

It has been more and more common to obtain useful and reliable information from blogs. This phenomenon suggests that readers are getting accustomed to use information released by bloggers. To make access to the increasing amount of information on blogs more effective, clustering is a useful approach. Results from the literature review suggest that using linking information, keywords, or tags/categories for calculating similarity is a critical step for clustering. Keywords are commonly retrieved from full text, which can be time consuming if there are many articles to be processed. For tags/categories, there is a problem of ambiguity; that is, different bloggers may define tags/categories of an identical content differently. In this thesis, the investigators proposed the approach of co-keywords, in which blog readers enter keywords into search engines to obtain relevant information from blogs. Keywords are an important medium reflecting the theme of an article through blog readers’ perspectives. Therefore, it is important for keywords to accurately reflect the theme of an article while matching user intention. In this study, the investigators proposed the use of a tracing code embedded in Blog Connect for collecting queried keywords. That is, blog readers will enter queried keywords into a search engine first, and then various variables will be used to select keywords. The study result has demonstrated that this method can effective reduce the time needed in the full-text keyword retrieval procedure while meeting the purpose of information sharing of blog readers (blog readers’ intentions).

摘要 i
Abstract ii
Tables of Contents iii
List of Tables iv
List of Figures v
Chapter 1 Introduction 1
Chapter 2 Related Work 4
Chapter 3 Data Description and Processing 6
Chapter 4 Similarity Framework 10
4.1 Projection Overlap 12
4.2 Projection mutual information and Distributional mutual information 13
4.3 Kendall’s tau coefficient 14
Chapter 5 Experiments and Results 16
5.1 Data Set 16
5.2 Results 16
5.2.1 Projection Overlap Similarity 16
5.2.2 Projection and Distribution Mutual Information 20
5.2.3 Kendall’s tau coefficient 21
Chapter 6 24
References 24
Appendix 1 30
Appendix 2 48


[1] N. Ali-Hasan and E. Adamic, “Expressing Social Relationships on the Blog through links and comments”, Referencing, www.ladamic.com/work/papers/oc/onlinecommunities.pdf. (accessed 9 June 2008).
[2] N. Agarwal, M. Galan, H. Liu, and S. Subramanya, “Clustering Blogs with Collective Wisdom” in Proceedings of Eighth International Conference on Web Engineering(ICWE), Jul. 2008, pp. 336-339.
[3] N. Agarwal, M. Galan, H. Liu, and S. Subramanya, WisColl: Collective Wisdom Based Blog Clustering, Information Sciences 2010; 180(1): 39-61.
[4] V. Abhishek and K. Hosanagar, “Keyword Generation for Search Engine Advertising using Semantic Similarity between Terms” in: Proceedings of the 9th International Conference on Electronic Commerce(ICEC), Aug. 2007, pp. 89-94.
[5] Blog Connect. Referencing, http://bridge.nchu.edu.tw/BC/.
[6] U. Bojars, J. G. Breslin, V. Peristeras, G. Tummarello, and S. Decker. Interlinking the Social Web with Semantics. Journal of IEEE Intelligent Systems 2008; 23 (3): 29-40.
[7] R. P. Carver, Reading for One Second, One Minute, or One Year From the Perspective of Rauding Theory. Scientific Studies of Reading 1997; 1(1): 3-43.
[8] J. L. Elsas, J. Arguello, J. Callan and J. G. Carbonell, “Retrieval and Feedback Models for Blog Feed Search,” in: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, July 2008, pp. 347-354.

[9] A. Fuxman, P. Tsaparas, K. Achan, and R. Agrawal, “Using the wisdom of the crowds for keyword generation” in: Proceedings of the 17th International Conference on World Wide Web(WWW), Apr. 2008, pp. 61-70.
[10] J. Gao and W. Lai, “Formal Concept Analysis Based Clustering for Blog Network Visualization,” in: Proceedings of International Conference on Advanced Data Mining and Applications, Berlin: Heidelberg, 2010, pp. 394-404.
[11] J. Gao and W. Lai, “Visualizing Blogsphere Using Content Based Clusters” in: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence Agent Technology, Dec. 2008, vol. 1, pp. 832-835.
[12] G. Hope, T. Wang, and S. Barkataki, “Convergence of Web 2.0 and Semantic Web: A Semantic Tagging and Searching System for Creating and Searching Blogs,” in: Proceedings of IEEE International Conference on Semantic Computing (ICSC), Irvine: California, 2007, pp. 201-208.
[13] X. Hu, and B. Wu, “Automatic Keyword Extraction Using Linguistic Features” in: Proceedings of the Sixth IEEE International Conference on Data Mining-Workshop(ICDMW), Dec. 2006, pp. 19-23.
[14] A. Juffinger and E. Lex, “Crosslanguage Blog Mining and Trend Visualisation” in Proceedings of 18th International World Wide Web Conference, Apr. 2009, pp.1149-1150.
[15] B. J. Jansen, D. L. Booth, and A. Spink, “Determining the User Intent of Web Search Engine Queries,” in: Proceedings of International Conference on World Wide Web, Alberta: Canada, 2007, pp.1149-1150.
[16] JSOUP, “Java HTML Parser”, Referencing: http://jsoup.org/.
[17] N. Johnson, 2008, “Google on User Intent in Search Queries, Search Engine Watch,” [Online] Available: http://searchenginewatch.com/article/2053806/Google-On-User-Intent-in-Search-Queries.
[18] T. Kuzar and P. Navrat, “Preprocessing of Slovak Blog Articles for Clustering” in: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2010, Aug. 2010, vol. 3, pp. 314-317.
[19] J. W. Kim, K. S. Candan, and J. Tatemura, “CDIP: Collection-Driven, yet Individuality-Preserving Automated Blog Tagging” in: Proceedings of International Conference on Semantic Computing(ICSC), Sep. 2007, pp. 87-94.
[20] M. G. Kendall, A New Measure of Rank Correlation, Biometrika 1938; 30(1/2): 81-93.
[21] L. Lu and F. Zhu, “Blogger clustering by utilizing link information” in: Proceedings of IEEE International Conference on Intelligent Computing and Intelligent Systems(ICIS), 2010, Oct. vol. 2, pp. 267-270.
[22] Y. Lu and H. Lee, “Blog Community Discovery Based on Tag Data Clustering” in: Proceedings of IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application(PACCIA), Dec. 2008, vol. 2, pp. 14-18.
[23] B. Larsen and C. Aone, “Fast and Effective Text Mining Using Linear-time Document Clustering” in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge discovery and Data Mining (KDD ’99), Aug. 1999, pp. 16-22.
[24] V. Lai, C. C. Rajashekar, and W. Rand, “Comparing Social Tagging to Microblogs”
in: Proceedings of third IEEE International Conference on Social Computing (socialcom)/Privacy, Security, Risk, and Trust(passt), Oct. 2011, pp. 1380-1383.
[25] mmseg4j, Available: http://code.google.com/p/mmseg4j/.
[26] B. Markines, C. Cattuto, F. Menczer, D. Benz, A. Hotho, and G. Stumme. “Evaluating similarity measures for emergent semantics of social tagging” in: Proceedings of the 18th International Conference on World Wide Web(WWW ’09), Apr. 2009, pp. 641-650.
[27] K. Ohtsuki, T. Matsuoka, S. Matsunaga, and S. Furui, “Topic extraction with multiple topic-words in broadcast-news speech” in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May. 1998, vol. 1, pp. 329-332.
[28] A. Qamra, B. Tseng, and E. Y. Chang, “Mining Blog Stories Using Community-based and Temporal Clustering,” in: Proceedings of 15th ACM International Conference Information and Knowledge Management, Arlington: Virginia, USA, 2006, pp. 58-67.
[29] J. Sobel, “State of the Blogosphere 2010 Introduction. Technorat” , http://technorati.com/blogging/article/state-of-the-blogosphere-2010-introduction/ (2010, Nov. 3).
[30] R. Stokes, Ultimate Guide to Pay-Per-Click Advertising, Irvine, CA: Entrepreneur Press, 2010.
[31] State of Blogosphere, “State of the Blogosphere 2011: Introduction and Methodology,” Referencing: http://technorati.com/social-media/article/state-of-the-blogosphere-2011-introduction/ (2011, Nov. 4).
[32] A. K. Singh and R. C. Joshi. Clustering of Blogs with Enhanced Semantics. International Journal of Computer Applications 2011; 16 (7): 12-16.
[33] G. Srinivas, N. Tandon, and V. Varma. “A weighted tag similarity measure based on a collaborative weight model” in: Proceedings of the 2nd International Workshop on Search and Mining User-generated Contents(SMUC ’10), Oct. 2010, pp. 79-86.
[34] G. Salton and M. J. McGill, Introduction to modern information retrieval, NY, USA: McGraw-Hill, Inc. 1986.
[35] C.-H. Tsai, “A World Identification System for Mandarin Chinese Text Based on Two Variants of the Maximum Matching Algorithm”, Chih-Hao Tsai''s Technology Page [Online]. Referencing: http://www.geocities.com/hao510/mmseg/ ( 2004, Feb. 5).
[36] T. Treanor, “2011 Blogging Statistics”, Referencing: http://www.rightmixmarketing.com/right-mix-blog/blogging-statistics/. (2011, Oct. 3).
[37] WordNet, Referencing: http://wordnet.princeton.edu/.
[38] Yahoo, “Yahoo Directory,” Referencing: http://dir.yahoo.com/.
[39] Y. Zhang, K. Gao, B. Zhang, J. Guo, F. Gao, and P. Guo, “Clustering Blog Posts Using Tags and Relations in the Blogosphere” in: Proceedings of 1st International Conference on Information Science and Engineering(ICISE), Nanjing, China, Dec. 2009, pp. 817-820.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top