臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.211) 您好！臺灣時間：2026/03/13 09:50

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
紙本論文
QR Code

本論文永久網址:

研究生:

孫建成

研究生(外文):

Chjen-Cheng Sun

論文名稱:

促進個人化網頁摘要搜尋的階層式分群系統

論文名稱(外文):

A hierarchical clustering system to enhance personalized web-snippet search

指導教授:

周世傑

指導教授(外文):

Shih-Chieh Chou

學位類別:

碩士

校院名稱:

國立中央大學

系所名稱:

資訊管理研究所

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2006

畢業學年度:

語文別:

中文

論文頁數:

中文關鍵詞:

資訊檢索、網頁摘要文件分群系統、個人化

外文關鍵詞:

web-snippet clustering system、Information retrieval、personalization

相關次數:

被引用:4
點閱:327
評分:
下載:61
書目收藏:5

本研究提出一個人化之階層式網頁摘文件分群系統，此系統架構在搜尋引擎之上，依據取使用者之查詢字串，匯集相關的網頁摘要文件，形成瀏覽階層以說明文件內容，各網頁摘要文件會被分群到適切群集中，每一個群集的概念由其標籤所描述。最後，依據使用者剖析檔中的使用者偏好排序所有文件以及標籤，產出個人化的瀏覽階層。
本研究系統應用語彙關聯擷取標籤，其優點在於可取得位置非連續但相互關聯的文字作為標籤，因此可以獲取形成較為彈性的標籤，並且，如同實驗結果所示，能夠較精確地評估標籤的出現次數。
在瀏覽階層建置方面，本研究擴充DisCover演算法，提供一個以標籤分佈為基礎的文件分群演算法，可建構出無冗述的瀏覽階層。在挑選候選子標籤時，依據文件涵蓋、兄弟節點區別性、冗述、以及緊密等四項要素評估各候選子標籤。如實驗結果所示，此演算法可避免產生冗述。
在使用者剖析檔建置上，利用一參考本體與使用者興趣資料，建構目錄式的使用者剖析檔。在線上處理階段，依據每一網頁摘要文件及標籤之重要性排序，產出個人化的瀏覽階層。網頁摘要文件之重要性，依相關聯的剖析檔目錄相似度與目錄權重值計算而得，而標籤重要性之計算，則考量每一標籤所處節點上相關聯文件之重要性、每一標籤出現的頻率、及標籤位於節點的深度。如同實驗結果所示，所使用的文件排序機制，確實可協助使用者搜尋資訊。
本研究系統之效益包含說明網頁摘要文件內容之關聯性以及協助使用者搜尋所需資訊兩個方面。
本研究之貢獻如下所述：
1. 提供一階層式文件分群演算法，此演算法可建構無冗述之瀏覽階層。
2. 應用以使用者剖析檔為基礎的分類器於網頁摘要文件分群系統中，使網頁摘要文件分群系統能夠建置個人化的瀏覽階層。

This paper provides a hierarchical web-snippet clustering system with the personalized ability on search engines. According to the user’s query string, the system collects snippets and formulates the corresponding browsing hierarchy to describe contents of snippets. In the browsing hierarchy, every snippet is clustered into fit clusters and the concept of every cluster is described by its label. At the last phase, the system sorts all snippets and labels according to the user’s preferences in the user profile, and outputs the personalized browsing hierarchy.
The system applies lexical affinity to extract labels. By using statistical measures the system can extract related but interrupted words as labels. Thus, more flexible forms of labels can be used and as the experiment shows, the system can count label frequency more precisely.
In the aspect of building the browsing hierarchy, our research provides an algorithm that extends the DisCover algorithm and can produce a browsing hierarchy without redundancy according to label dispersion. The algorithm selects son labels in greedy way and evaluates every candidate son label against four factors - document coverage、sibling node distinctiveness、redundancy、and compactness. As the experiment shows, the algorithm avoids producing redundancy.
In the aspect of constructing the user profile, the system uses a reference ontology and documents which describe user’s preference to build the directory-like user profile. At the online phase, the system sorts snippets and labels according to their importance. The importance of one snippet is computed from weights and similarities of related directories in the user profile. The importance of one label is weighted according to the label frequency, the label depth, and related snippets on the node. As the experiments shows, the system can assist the user in searching information by sorting snippets.
The effectiveness of our system includes discovering thematic relationships among snippets and assisting in searching wanted information.
The contribution of our research is twofold:
1. To provide a hierarchical document clustering algorithm which can be used to build the browsing hierarchy without redundancy.
2. To apply a profile-based classifier to the web-snippet clustering system to produce the personalized browsing hierarchy.

第一章緒論 1
1.1 研究背景與動機 1
1.2 研究目的與範圍 2
1.3 研究限制 3
1.4 研究流程 3
1.5 論文架構 4
第二章文獻探討 6
2.1 搜尋引擎與網頁摘要文件分群系統 6
2.2 網頁摘要文件分群系統 9
2.2.1 字彙及平列式分群 10
2.2.2 句子及平列式分群 11
2.2.3 字彙及平列式分群 12
2.2.4 句子及階層式分群 17
2.2.5 網頁摘要文件分群系統設計總覽 21
2.3 搭配詞 22
2.4 以使用者剖析檔為基礎之個人化系統 27
第三章系統設計 29
3.1 系統構想 29
3.2 系統架構 30
3.2.1 文件前處理階段 33
3.2.2 瀏覽階層建構階段 37
3.2.3 個人化瀏覽階層建構階段 41
第四章實驗結果 47
4.1 實驗設計 47
4.2 實驗結果 49
第五章結論 58
5.1 研究結論及貢獻 58
5.2 未來研究方向 60
參考文獻 61

[1]P. Anick and S. Tipirneni, “The paraphrase search assistant: Terminological feedback for iterative information seeking,” In Proceedings on the 22th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–159, 1999.
[2]R. Armstrong, D. Freitag, T. Joachims, and T. Mitchell, “WebWatcher: A Learning Apprentice for the World Wide Web,” In Proceedings of the AAAI Spring Symposium on Information Gathering, pp. 6-12, 1995.
[3]P. Baldi, P. Frasconi, and P. Smyth, Modeling the Internet and the Web - Probabilistic Methods and Algorithms, Wiley & Sons, 2003.
[4]M. Benson, “The structure of the collocational dictionary,” In International Journal of Lexicography, 2(1), pp. 1-14, 1989.
[5]E. Brill, “A simple rule-based part of speech tagger, ”In Proceedings of the 3th Conference of Applied Natural Language Processing, pp. 152-155, 1992.
[6]P. Chan, “Constructing Web User Profiles: A Non-Invasive Learning Approach,” In Web Usage Analysis and User Profiling, LNAI 1836, Springer-Verlag, pp. 39-55, 2003.
[7]L. Chen and K. Sycara, “A Personal Agent for Browsing and Searching,” In Proceedings of the 2nd International Conference on Autonomous Agents, pp. 132-139, 1998.
[8]J. Cho and H. Garcia-Molina, “The Evolution of the Web and Implications for an Incremental Crawler,“ In Proceedings of the 26thInternational Conference on Very Large Databases, pp. 200-209 , 2000.
[9]Y. Choueka, “Looking for needles in a haystack or locating interesting collocations expressions in large textual databases,” In Computational Linguistics, 20(4), pp. 635-648, 1988.
[10]P. Ferragina and A. Gulli, "A personalized search engine based on web-snippet hierarchical clustering," Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 801-810, 2005.
[11]K. T. Frantzi and S. Ananiadou, “Extracting nested collocations,” In Proceedings of the 16th Conference on Computational linguistics, pp. 41-46, 1996.
[12]K. Frantzi, S. Ananiadou, and H. Mima, “Automatic recognition of multiword terms,” In International Journal of Digital Libraries, 3(2), pp. 117-132, 2000.
[13]B. Fung, K. Wang, and M. Ester, “Large hierarchical document clustering using frequent itemsets,” In Proceedings of the 3th SIAM International Conference on Data Mining, pp. 59-70, 2003.
[14]S. Gauch , J. Chafee, and A. Pretschner, ”Ontology-based personalized search and browsing,” In Web Intelligence and Agent System, 1(3-4), pp. 219-234, 2003.
[15]F. Giannotti, M. Nanni, and D. Pedreschi, “Webcat: Automatic categorization of web search results,” In Proceedings of the 11th Italian Symposium on Advanced Database Systems, pp. 507-518, 2003.
[16]M. A. Hearst and J. O. Pedersen, ”Reexamining the Cluster Hypothesis: Scatter/Gather on Retrieval Results,” In Proceedings of 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 76-84, 1996.
[17]S. Ikehara, S. Shirai, and T. Kawaoka, “Automatic Extraction of Collocations from Very Large Japanese Corpora using N-gram Statistics,” In Transactions of Information Processing Society of Japan, 1995(11), pp. 2584-2596, 1995.
[18]Z. Jiang, A. Joshi, R. Krishnapuram, and Li. Yi, “Retriever: Improving web search engine results using clustering,” In Managing Business with Electronic Commerce 02, 2002.
[19]R. Jin, C. Falusos, and A. G. Hauptmann, “Meta-scoring: Automatically evaluating term weighting schemes in ir without precision-recall,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 83-89, 2001.
[20]K. Kita, Y. Kato, T. Omoto, and Y. Yano, “A comparative study of automatic extraction of collocations from corpora: Mutual information vs. cost criteria,” In Journal of Natural Language Processing, 1(1), pp. 21-33, 1994.
[21]J. Konstan, B. Miller, D. Maltz, J. Herlocker, L. Gordon, and J. Riedl, “GroupLens: Applying Collaborative Filtering To Usenet News,” In Communications of the ACM, 40(3), pp. 77-87, 1997.
[22]K. Kummamuru, R. Lotlikar, S. Roy, K. Singal, and R. Krishnapuram, “A hierarchical monothetic document clustering algorithm for summarization and browsing search results,” In Proceedings of the 13th International Conference on World Wide Web, pp. 658-665, 2004.
[23]T. Kurki, S. Jokela, R. Sulonen, and M. Turpeinen, “Agents in Delivering Personalized Content Based on Semantic Metadata,” In Proceedings of the 1999 AAAI Spring Symposium Workshop on Intelligent Agents in Cyberspace, pp. 84-93, 1999.
[24]D. J. Lawrie and W. B. Croft, “Generating hierarchical summaries for web searches,” In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 457-458, 2003.
[25]D. Lawrie, W. B. Croft, and A. Rosenberg, “Finding topic words for hierarchical summarization,” In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 349-357, 2001.
[26]H. Lieberman, “Autonomous Interface Agents,” In Proceedings of the ACM Conference on Computers and Human Interaction, pp. 67-74, 1997.
[27]Y. S. Maarek, R. Fagin, I. Z. Ben-Shaul, and D. Pelleg, “Ephemeral document clustering for web applications,” Technical Report RJ 10186, IBM Research, 2002.
[28]T. Malone, K. Grant, F. Turbak, S. Brobst, and M. Cohen, “Intelligent Information Sharing Systems,” In Communications of the ACM, 30(5), pp. 390-402, 1987.
[29]C. D. Manning and H. Schutze, “Foundations of Statistical Natural Language Processing,” MIT Press, 2001.
[30]D. Mladeni, “Personal WebWatcher: Design and Implementation,” Technical Report IJS-DP-7472, J. Stefan Institute, Department for Intelligent Systems, Ljubljana, Slovenia, 1998.
[31]M. Montebello, W. Gray, and S. Hurley, “A Personable Evolvable Advisor for WWW Knowledge-Based Systems,” In Proceedings of the 1998 International Database Engineering and Application Symposium, pp. 224-233, 1998.
[32]M. Nagao and S. Mori, “A new Method of N gram Statistics for Large Number of n and Automatic Extraction of Words and Phrases from Large Text Data of Japanese,” In Proceedings of 15th International Conference on Computational Linguistics, pp. 611-615, 1994.
[33]S. Osinski and D. Weiss, “Conceptual clustering using lingo algorithm: Evaluation on open directory project data,” In Proceedings of 5th Conference on Intelligent Information Processing and Web Mining, pp. 369-377, 2004.
[34]M. Pazzani, J. Muramatsu, and D. Billsus, “Syskill & Webert: Identifying Interesting Web Sites,” In Proceedings of the 13th National Conference on Artificial Intelligence, pp. 54-61, 1996.
[35]A. Pretschner, “Ontology Based Personalized Search,” Master’s thesis, University of Kansas, 1999.
[36]J. Rucker and M. J. Polanco, “Siteseer: Personalized Navigation for the Web,” In Communications of the ACM, 40(3), pp. 73-75, 1997.
[37]M. Sanderson and W. B. Croft, “Deriving concept hierarchies from text,” In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 206-213. 1999.
[38]J. Shavlik and T. Eliassi-Rad, “Intelligent Agents for Web-Based Tasks: An Advice-Taking Approach,” In Working Notes of the AAAI/ICML-98 Workshop on Learning for text categorization, pp. 63-70, 1998.
[39]B. Sheth, “A Learning Approach to Personalized Information Filtering,” Master’s thesis, Massachusetts Institute of Technology, 1994.
[40]J. Sinclair, “Corpus, Concordance, Collocation,” Oxbridge University Press, 1991.
[41]F. Smadja, “Retrieving Collocations from Text: Xtract,” In Computational Linguistics, 19(1), pp. 143-177, 1993.
[42]H. Sorensen and M. McElligott, “PSUN: A Profiling System for Usenet News,” In Proceedings of CIKM’95 Workshop on Intelligent Information Agents, 1995.
[43]A. Stefani and C. Strappavara, “Personalizing Access to Web Sites: The SiteIF Project,” In Proceedings of the 2nd Workshop on Adaptive Hypertext and Hypermedia, pp. 69-74, 1998.
[44]E. Terra and C. L. A. Clarke, “Frequency Estimates for statistical word similarity measures,” In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 165-172 , 2003.
[45]N. Toulas, J. Cho, and C. Olston, “What’s New on the Web? The Evolution of the Web from a Search Engine Perspective,” In Proceedings of the 13th International World Wide Web Conference, pp. 1–12, 2004.
[46]J. Xu and W. Croft, “Query expansion using local and global document analysis,” In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 4-11, 1996.
[47]M. Yamamoto and K. W. Church, “Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus,” In Computational Linguistics, 27(1), pp. 1-30, 2001.
[48]O. Zamir and O. Etzioni, “Grouper: a dynamic clustering interface to Web search results,” In Proceedings of the 8th International World Wide Web Conference, pp. 1-12, 1999.
[49]H. Zeng, Q. He, Z. Chen, W. Ma, and J. Ma, “Learning to cluster web search results,” In Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 210-217, 2004.
[50]D. Zhang and Y. Dong, “Semantic, hierarchical, online clustering of web search results,” In Proceedings of the 6th Asia Pacific Web Conference. pp. 69-78, 2004.
[51]http://a9.com/
[52]http://dmoz.org/
[53]http://www.about.com/
[54]http://www.altavista.com/
[55]http://www.ask.com/
[56]http://www.google.com/
[57]http://www.infind.com/
[58]http://www.lycos.com/
[59]http://www.mamma.com/
[60]http://www.metacrawler.com/
[61]http://www.metafind.com/
[62]http://www.miner.uol.com.br/
[63]http://www.vivisimo.com/
[64]http://www.yahoo.com/

電子全文

國圖紙本論文

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

1.	以網路探勘技術建立維基百科瀏覽輔助介面之初探
2.	以網路標籤為基礎之個人化網路購物商品推薦系統
3.	應用TFIDF建立搜尋引擎結果之資訊分類系統
4.	植基於SVM之Blog版型推薦平臺開發
5.	結合內容及合作推薦技術之文獻數位圖書館
6.	網路問答社群之主動內容傳遞機制開發
7.	利用不相關回饋強化個人化搜尋
8.	企業ISO文件智慧型查詢系統
9.	使用向量空間模型來產生網際網路文件的智慧型超連接
10.	個人分類Proxy伺服器之應用
11.	以Java語言為基礎的個人化ProxyServer及其應用

無相關期刊

1.	結合基因演算法與使用者興趣檔之資訊檢索研究
2.	適用網格計算環境之多重代理人付款系統
3.	企業流程再造與企業資源規劃系統效益間關係之探索性研究
4.	資訊科技、知識資本、生產力與外溢：台灣製造業的實證研究
5.	營建業ERP整合PDA模型之研究
6.	非球面係數對三片型鏡組映像品質影響之分析
7.	三片型鏡組之分析設計
8.	無元素葛勒金法
9.	液晶電視前框動態分析與結構改善
10.	液體黏度及含量對迴流強度影響之探討
11.	鈹中和劑與鍶、銻改良劑對A357鋁合金微結構及性質之影響
12.	行動隨意網路可調適及可延展之位置服務協定
13.	使用樣板比對做進出口行人數量統計
14.	利用區塊人臉特徵為基礎之混合式人臉辨識系統
15.	以范諾圖為基礎的對等式網路虛擬環境相鄰節點一致性研究

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室