跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.126) 您好!臺灣時間:2025/11/28 23:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:孫建成
研究生(外文):Chjen-Cheng Sun
論文名稱:促進個人化網頁摘要搜尋的階層式分群系統
論文名稱(外文):A hierarchical clustering system to enhance personalized web-snippet search
指導教授:周世傑周世傑引用關係
指導教授(外文):Shih-Chieh Chou
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:76
中文關鍵詞:資訊檢索網頁摘要文件分群系統個人化
外文關鍵詞:web-snippet clustering systemInformation retrievalpersonalization
相關次數:
  • 被引用被引用:4
  • 點閱點閱:324
  • 評分評分:
  • 下載下載:61
  • 收藏至我的研究室書目清單書目收藏:5
本研究提出一個人化之階層式網頁摘文件分群系統,此系統架構在搜尋引擎之上,依據取使用者之查詢字串,匯集相關的網頁摘要文件,形成瀏覽階層以說明文件內容,各網頁摘要文件會被分群到適切群集中,每一個群集的概念由其標籤所描述。最後,依據使用者剖析檔中的使用者偏好排序所有文件以及標籤,產出個人化的瀏覽階層。
本研究系統應用語彙關聯擷取標籤,其優點在於可取得位置非連續但相互關聯的文字作為標籤,因此可以獲取形成較為彈性的標籤,並且,如同實驗結果所示,能夠較精確地評估標籤的出現次數。
在瀏覽階層建置方面,本研究擴充DisCover演算法,提供一個以標籤分佈為基礎的文件分群演算法,可建構出無冗述的瀏覽階層。在挑選候選子標籤時,依據文件涵蓋、兄弟節點區別性、冗述、以及緊密等四項要素評估各候選子標籤。如實驗結果所示,此演算法可避免產生冗述。
在使用者剖析檔建置上,利用一參考本體與使用者興趣資料,建構目錄式的使用者剖析檔。在線上處理階段,依據每一網頁摘要文件及標籤之重要性排序,產出個人化的瀏覽階層。網頁摘要文件之重要性,依相關聯的剖析檔目錄相似度與目錄權重值計算而得,而標籤重要性之計算,則考量每一標籤所處節點上相關聯文件之重要性、每一標籤出現的頻率、及標籤位於節點的深度。如同實驗結果所示,所使用的文件排序機制,確實可協助使用者搜尋資訊。
本研究系統之效益包含說明網頁摘要文件內容之關聯性以及協助使用者搜尋所需資訊兩個方面。
本研究之貢獻如下所述:
1. 提供一階層式文件分群演算法,此演算法可建構無冗述之瀏覽階層。
2. 應用以使用者剖析檔為基礎的分類器於網頁摘要文件分群系統中,使網頁摘要文件分群系統能夠建置個人化的瀏覽階層。
This paper provides a hierarchical web-snippet clustering system with the personalized ability on search engines. According to the user’s query string, the system collects snippets and formulates the corresponding browsing hierarchy to describe contents of snippets. In the browsing hierarchy, every snippet is clustered into fit clusters and the concept of every cluster is described by its label. At the last phase, the system sorts all snippets and labels according to the user’s preferences in the user profile, and outputs the personalized browsing hierarchy.
The system applies lexical affinity to extract labels. By using statistical measures the system can extract related but interrupted words as labels. Thus, more flexible forms of labels can be used and as the experiment shows, the system can count label frequency more precisely.
In the aspect of building the browsing hierarchy, our research provides an algorithm that extends the DisCover algorithm and can produce a browsing hierarchy without redundancy according to label dispersion. The algorithm selects son labels in greedy way and evaluates every candidate son label against four factors - document coverage、sibling node distinctiveness、redundancy、and compactness. As the experiment shows, the algorithm avoids producing redundancy.
In the aspect of constructing the user profile, the system uses a reference ontology and documents which describe user’s preference to build the directory-like user profile. At the online phase, the system sorts snippets and labels according to their importance. The importance of one snippet is computed from weights and similarities of related directories in the user profile. The importance of one label is weighted according to the label frequency, the label depth, and related snippets on the node. As the experiments shows, the system can assist the user in searching information by sorting snippets.
The effectiveness of our system includes discovering thematic relationships among snippets and assisting in searching wanted information.
The contribution of our research is twofold:
1. To provide a hierarchical document clustering algorithm which can be used to build the browsing hierarchy without redundancy.
2. To apply a profile-based classifier to the web-snippet clustering system to produce the personalized browsing hierarchy.
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的與範圍 2
1.3 研究限制 3
1.4 研究流程 3
1.5 論文架構 4
第二章 文獻探討 6
2.1 搜尋引擎與網頁摘要文件分群系統 6
2.2 網頁摘要文件分群系統 9
2.2.1 字彙及平列式分群 10
2.2.2 句子及平列式分群 11
2.2.3 字彙及平列式分群 12
2.2.4 句子及階層式分群 17
2.2.5 網頁摘要文件分群系統設計總覽 21
2.3 搭配詞 22
2.4 以使用者剖析檔為基礎之個人化系統 27
第三章 系統設計 29
3.1 系統構想 29
3.2 系統架構 30
3.2.1 文件前處理階段 33
3.2.2 瀏覽階層建構階段 37
3.2.3 個人化瀏覽階層建構階段 41
第四章 實驗結果 47
4.1 實驗設計 47
4.2 實驗結果 49
第五章 結論 58
5.1 研究結論及貢獻 58
5.2 未來研究方向 60
參考文獻 61
[1]P. Anick and S. Tipirneni, “The paraphrase search assistant: Terminological feedback for iterative information seeking,” In Proceedings on the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159, 1999.
[2]R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell, “WebWatcher: A Learning Apprentice for the World Wide Web,” In Proceedings of the AAAI Spring Symposium on Information Gathering, pp. 6-12, 1995.
[3]P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web - Probabilistic Methods and Algorithms, Wiley & Sons, 2003.
[4]M. Benson, “The structure of the collocational dictionary,” In International Journal of Lexicography, 2(1), pp. 1-14, 1989.
[5]E. Brill, “A simple rule-based part of speech tagger, ”In Proceedings of the 3th Conference of Applied Natural Language Processing, pp. 152-155, 1992.
[6]P. Chan, “Constructing Web User Profiles: A Non-Invasive Learning Approach,” In Web Usage Analysis and User Profiling, LNAI 1836, Springer-Verlag, pp. 39-55, 2003.
[7]L. Chen and K. Sycara, “A Personal Agent for Browsing and Searching,” In Proceedings of the 2nd International Conference on Autonomous Agents, pp. 132-139, 1998.
[8]J. Cho and H. Garcia-Molina, “The Evolution of the Web and Implications for an Incremental Crawler,“ In Proceedings of the 26thInternational Conference on Very Large Databases, pp. 200-209 , 2000.
[9]Y. Choueka, “Looking for needles in a haystack or locating interesting collocations expressions in large textual databases,” In Computational Linguistics, 20(4), pp. 635-648, 1988.
[10]P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering," Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 801-810, 2005.
[11]K. T. Frantzi and S. Ananiadou, “Extracting nested collocations,” In Proceedings of the 16th Conference on Computational linguistics, pp. 41-46, 1996.
[12]K. Frantzi, S. Ananiadou, and H. Mima, “Automatic recognition of multiword terms,” In International Journal of Digital Libraries, 3(2), pp. 117-132, 2000.
[13]B. Fung, K. Wang, and M. Ester, “Large hierarchical document clustering using frequent itemsets,” In Proceedings of the 3th SIAM International Conference on Data Mining, pp. 59-70, 2003.
[14]S. Gauch , J. Chafee, and A. Pretschner, ”Ontology-based personalized search and browsing,” In Web Intelligence and Agent System, 1(3-4), pp. 219-234, 2003.
[15]F. Giannotti, M. Nanni, and D. Pedreschi, “Webcat: Automatic categorization of web search results,” In Proceedings of the 11th Italian Symposium on Advanced Database Systems, pp. 507-518, 2003.
[16]M. A. Hearst and J. O. Pedersen, ”Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results,” In Proceedings of 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76-84, 1996.
[17]S. Ikehara, S. Shirai, and T. Kawaoka, “Automatic Extraction of Collocations from Very Large Japanese Corpora using N-gram Statistics,” In Transactions of Information Processing Society of Japan, 1995(11), pp. 2584-2596, 1995.
[18]Z. Jiang, A. Joshi, R. Krishnapuram, and Li. Yi, “Retriever: Improving web search engine results using clustering,” In Managing Business with Electronic Commerce 02, 2002.
[19]R. Jin, C. Falusos, and A. G. Hauptmann, “Meta-scoring: Automatically evaluating term weighting schemes in ir without precision-recall,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 83-89, 2001.
[20]K. Kita, Y. Kato, T. Omoto, and Y. Yano, “A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria,” In Journal of Natural Language Processing, 1(1), pp. 21-33, 1994.
[21]J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl, “GroupLens: Applying Collaborative Filtering To Usenet News,” In Communications of the ACM, 40(3), pp. 77-87, 1997.
[22]K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram, “A hierarchical monothetic document clustering algorithm for summarization and browsing search results,” In Proceedings of the 13th International Conference on World Wide Web, pp. 658-665, 2004.
[23]T. Kurki, S. Jokela, R. Sulonen, and M. Turpeinen, “Agents in Delivering Personalized Content Based on Semantic Metadata,” In Proceedings of the 1999 AAAI Spring Symposium Workshop on Intelligent Agents in Cyberspace, pp. 84-93, 1999.
[24]D. J. Lawrie and W. B. Croft, “Generating hierarchical summaries for web searches,” In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 457-458, 2003.
[25]D. Lawrie, W. B. Croft, and A. Rosenberg, “Finding topic words for hierarchical summarization,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 349-357, 2001.
[26]H. Lieberman, “Autonomous Interface Agents,” In Proceedings of the ACM Conference on Computers and Human Interaction, pp. 67-74, 1997.
[27]Y. S. Maarek, R. Fagin, I. Z. Ben-Shaul, and D. Pelleg, “Ephemeral document clustering for web applications,” Technical Report RJ 10186, IBM Research, 2002.
[28]T. Malone, K. Grant, F. Turbak, S. Brobst, and M. Cohen, “Intelligent Information Sharing Systems,” In Communications of the ACM, 30(5), pp. 390-402, 1987.
[29]C. D. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing,” MIT Press, 2001.
[30]D. Mladeni, “Personal WebWatcher: Design and Implementation,” Technical Report IJS-DP-7472, J. Stefan Institute, Department for Intelligent Systems, Ljubljana, Slovenia, 1998.
[31]M. Montebello, W. Gray, and S. Hurley, “A Personable Evolvable Advisor for WWW Knowledge-Based Systems,” In Proceedings of the 1998 International Database Engineering and Application Symposium, pp. 224-233, 1998.
[32]M. Nagao and S. Mori, “A new Method of N gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese,” In Proceedings of 15th International Conference on Computational Linguistics, pp. 611-615, 1994.
[33]S. Osinski and D. Weiss, “Conceptual clustering using lingo algorithm: Evaluation on open directory project data,” In Proceedings of 5th Conference on Intelligent Information Processing and Web Mining, pp. 369-377, 2004.
[34]M. Pazzani, J. Muramatsu, and D. Billsus, “Syskill & Webert: Identifying Interesting Web Sites,” In Proceedings of the 13th National Conference on Artificial Intelligence, pp. 54-61, 1996.
[35]A. Pretschner, “Ontology Based Personalized Search,” Master’s thesis, University of Kansas, 1999.
[36]J. Rucker and M. J. Polanco, “Siteseer: Personalized Navigation for the Web,” In Communications of the ACM, 40(3), pp. 73-75, 1997.
[37]M. Sanderson and W. B. Croft, “Deriving concept hierarchies from text,” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206-213. 1999.
[38]J. Shavlik and T. Eliassi-Rad, “Intelligent Agents for Web-Based Tasks: An Advice-Taking Approach,” In Working Notes of the AAAI/ICML-98 Workshop on Learning for text categorization, pp. 63-70, 1998.
[39]B. Sheth, “A Learning Approach to Personalized Information Filtering,” Master’s thesis, Massachusetts Institute of Technology, 1994.
[40]J. Sinclair, “Corpus, Concordance, Collocation,” Oxbridge University Press, 1991.
[41]F. Smadja, “Retrieving Collocations from Text: Xtract,” In Computational Linguistics, 19(1), pp. 143-177, 1993.
[42]H. Sorensen and M. McElligott, “PSUN: A Profiling System for Usenet News,” In Proceedings of CIKM’95 Workshop on Intelligent Information Agents, 1995.
[43]A. Stefani and C. Strappavara, “Personalizing Access to Web Sites: The SiteIF Project,” In Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, pp. 69-74, 1998.
[44]E. Terra and C. L. A. Clarke, “Frequency Estimates for statistical word similarity measures,” In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 165-172 , 2003.
[45]N. Toulas, J. Cho, and C. Olston, “What’s New on the Web? The Evolution of the Web from a Search Engine Perspective,” In Proceedings of the 13th International World Wide Web Conference, pp. 1–12, 2004.
[46]J. Xu and W. Croft, “Query expansion using local and global document analysis,” In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4-11, 1996.
[47]M. Yamamoto and K. W. Church, “Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus,” In Computational Linguistics, 27(1), pp. 1-30, 2001.
[48]O. Zamir and O. Etzioni, “Grouper: a dynamic clustering interface to Web search results,” In Proceedings of the 8th International World Wide Web Conference, pp. 1-12, 1999.
[49]H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma, “Learning to cluster web search results,” In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210-217, 2004.
[50]D. Zhang and Y. Dong, “Semantic, hierarchical, online clustering of web search results,” In Proceedings of the 6th Asia Pacific Web Conference. pp. 69-78, 2004.
[51]http://a9.com/
[52]http://dmoz.org/
[53]http://www.about.com/
[54]http://www.altavista.com/
[55]http://www.ask.com/
[56]http://www.google.com/
[57]http://www.infind.com/
[58]http://www.lycos.com/
[59]http://www.mamma.com/
[60]http://www.metacrawler.com/
[61]http://www.metafind.com/
[62]http://www.miner.uol.com.br/
[63]http://www.vivisimo.com/
[64]http://www.yahoo.com/
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊