跳到主要內容

臺灣博碩士論文加值系統

(34.204.180.223) 您好!臺灣時間:2021/08/03 23:30
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊智捷
研究生(外文):Chih-Jie Yang
論文名稱:一個以語意引導的動態分群系統
論文名稱(外文):A Semantically Guided Dynamic Clustering System
指導教授:陸承志陸承志引用關係
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:55
中文關鍵詞:網頁分群語意網匯總式搜尋領域相關詞關聯式規則
外文關鍵詞:Web ClusteringSemantic WebMeta SearchField Association TermAssociation Rule
相關次數:
  • 被引用被引用:1
  • 點閱點閱:155
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究提出一個以語意引導的動態分群系統,這個系統採用匯總式搜尋 (Meta Search) 的方式,將使用者的查詢關鍵字送到 Google、Yahoo、MSN,再把搜尋引擎回傳的前三頁結果進行語意引導分群。
本研究的流程分為批次訓練和語意引導分群兩階段。批次訓練包含建立語意 RDF、建立領域相關詞和產生關聯分類規則三步驟;語意引導分群包含產生最佳、預設、待查與動態分群四個步驟。最佳群集包含一筆或多筆與查詢關鍵字最相關的網頁,篩選自搜尋回傳結果中排名前幾名的資料。預設與待查群集都是與查詢關鍵字有語意相關的群集,我們利用 Google Directory 回傳的結果,再加入人工微調,來尋找查詢關鍵字的相關主題,做為預設與待查群集。接著我們利用網頁內容的特徵詞來比對和觸發最適合的關聯分類法則,可將網頁分配到該法則指定的預設群集;若一個的群集標題與查詢關鍵字沒有語意相關,則此群集則為待查群集。
最後,本研究針對語意引導分群系統,進行分群效能評估與使用者滿意度調查。實驗結果顯示,第一層的語意引導分群的 Precision、Recall 與 F1-measure 分別為 0.96、0.90、0.92,在第二層的 3C 類別,平均的 Precision、Recall 與 F1-measure 分別為 0.99、0.98、0.99 ,顯示以此種方法來將網頁分預設群集的成效相當不錯。調查結果顯示,使用者對本系統的滿意度明顯高於商業分群系統的 Clusty 和純粹 K-Means 系統。
This study proposes a semantically guided dynamic clustering system based on meta search mechanism. The proposed system sends user’s queries to Google, Yahoo and MSN simultaneously and then analyzes the retrieved results through semantically guided clustering.
The system flow is divided into two stages: batch training and online semantic guided clustering. The batch training includes processes for creating semantic RDF, creating field association terms, and generating association rules. The online semantic guided clustering includes processes for generating the best and predefined clusters and conducting K-Means clustering. The best cluster includes several websites most relevant to the query term and is placed at the top of the clustering result pages. The predefined clusters are the ones that are semantically relevant to the queries suggested by Google directory with minor human adjustment. Predefined clusters include web pages that are allocated by the association rules triggered by the feature words within these pages. The other predefined cluster titles that don’t have any web pages being allocated to them are called yet to be retrieval clusters.
Finally, this study evaluates the effectiveness of the semantic guided clustering system. The experimental results indicate that the average precision, recall and F1-measure are 0.9, 0.90, and 0.92 respectively in the first layer. The average precision, recall and F1-measure are 0.99, 0.98, and 0.99 respectively in the second layer of 3C products. Also, a user study results indicate that our subjects are more satisfied with our system than pure a K-Means system and Clusty, a commercial system.
書名頁 i
論文口試委員審定書 ii
授權書 iii
中文摘要 iv
英文摘要 vi
致謝 viii
目錄 ix
表目錄 xi
圖目錄 xii
第一章、緒論 1
1.1. 研究動機 1
1.2. 研究目的 2
1.3. 論文架構 3
第二章、文獻探討 4
2.1. 搜尋引擎資料之特性及使用者行為 4
2.2. 語意網 6
2.3. 資源描述框架 8
2.4. 開放目錄計畫 10
2.5. 領域相關詞 12
2.6. 利用關聯式規則的文件分類 15
2.7. 資訊增益 17
2.8. 網頁分群方法 18
第三章、研究流程 22
3.1. 系統架構 22
3.2. 批次訓練 23
3.2.1. 建立語意RDF 24
3.2.2. 建立 RDF 領域相關詞 26
3.2.3. 建立 3C 領域相關詞 28
3.2.4. 產生關聯分類規則 31
3.3. 線上語意分群 33
3.3.1. 剖析網頁 33
3.3.2. 產生最佳群集 34
3.3.3. 產生預設與待查群集 34
3.3.4. K-Means分群 35
3.3.5. K-Means 群集標題選取 37
第四章、系統評估 39
4.1. RDF關聯式規則之評估 39
4.1.1. 實驗資料集 39
4.1.2. 實驗設計 39
4.1.3. 分類效能指標 39
4.1.4. 實驗結果 40
4.2. 3C關聯式規則之評估 42
4.2.1. 實驗資料集 42
4.2.2. 實驗設計 42
4.2.3. 實驗結果 42
4.3. 使用者滿意度調查 45
第五章、結論 49
參考文獻 51
附錄一 (使用者滿意度問卷調查表) 55
[1]平震宇,2007 『一個適用於行動裝置的網頁搜尋結果分群系統之研究』,元智大學資訊管理研究所碩士論文。
[2]陳智威,2008 『一個中英文搜尋結果即時分群系統之研究』,元智大學資訊管理研究所碩士論文。
[3]Agrawal, R. and Srikant, R. 1994. Fast Algorithms for Mining Association Rules. Proceedings of the 20th International Conference on VLDB, pp. 487-499.
[4]Arthur, D. and Vassilvitskii S. 2007. k-means++ The Advantages of Careful Seeding. Symposium on Discrete Algorithms (SODA), pp. 1027-1035.
[5]Berners-Lee, T., Hendler, J., and Lassila, O. May 2001. The Semantic Web. Scientific American Magazine, pp.29-37
[6]Chung W., Lai G., Bonillas A., Xi W., and Chen H. 2008. Organizing domain-specific information on the Web: An experiment on the Spanish business Web directory. International Journal of Human-Computer Studies (66:2), pp. 51-66.
[7]Dogpile.com. 2007. Different Engines, Different Results. A Research Study by Dogpile.com.
[8]Ferragina, P. and Gulli, A. 2005. A Personalized Search Engine Based on Web-Snippet Hierarchical Clustering. Proceedings of the 14th International World Wide Web Conference, pp. 801-810.
[9]Fuketa, M., Lee, S., Tsuji, T., Okada, M. and AOE, J. 2000. A document classification method by using field association words. Information Sciences (126:1), pp. 57-70.
[10]Google Official Blog. 2008. We knew the web was big. Available at http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html.
[11]Guha, R., McCool, R. and Miller, E. 2003. Semantic Search. Proceedings of the 12th international conference on World Wide Web, pp. 700-709.
[12]Internet World Stats. 2008. Internet Usage Statistics—The Big Picture. Available at http://www.internetworldstats.com/stats.htm.
[13]Jansen, B.J. and Spink, A. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Information Processing and Management (42:1), pp. 248-263.
[14]Kim, Y.S., Kang, B.H., and Compton, P. 2007. Search engine retrieval of changing information. Proceedings of the 16th international conference on World Wide Web, pp. 1195 - 1196.
[15]Kolesnikov, O., Lee, W., and Lipton. R. 2003. Filtering spam using search engines. ftp://ftp.cc.gatech.edu/pub/coc/tech_reports/2003/GIT-CC-03-58.pdf.
[16]Lewandowski, D. 2008. A three-year study on the freshness of Web search engine databases. Journal of Information Science (34:6), pp. 817-831.
[17]Li, W., J. Han, and J. Pei. 2001. CMAR: Accurate and efficient classification based on multiple class-association rules, Proceedings of ICDM, pp. 369–376.
[18]Lloyd, S. P. 1982. Least squares quantization in PCM. Information Theory, IEEE Transactions on (28:2), pp. 129-137.
[19]Lorigo, L., Pan, B., Hembrooke, H., Joachims, T., Granka, L. and Gay, G. 2006. The influence of task and gender on search and evaluation behavior using Google. Information Processing and Management (42:4), pp. 1123-1131.
[20]Ntoulas, A.,Cho, J., Olston, C. 2004. What''s New on the Web? The Evolution of the Web from a Search Engine Perspective. Proceedings of the 13th International World Wide Web Conference, pp. 1–12.
[21]Sharif, U.M., Ghada, E., Atlam, E., Fuketa, M., Morita, K. and AOE, J. 2007. Improvement of building field association term dictionary using passage retrieval. Information Processing & Management (43:6), pp. 1793-1807.
[22]Spink, A., Wolfram, D., Jansen, M.B.J. and Saracevic, T. 2001. Searching the Web: The Public and Their Queries. Journal of the American Society for Information Science and Technology (52:3), pp. 226-234.
[23]Veloso, A. W. M. Jr., and M. Zaki. 2006. Lazy associative classification. Proceedings of ICDM, pp. 645–654.
[24]Wang, J. and G. Karypis. 2005. HARMONY: Efficiently mining the best rules for classification. Proceedings of SDM, pp. 205–216.
[25]Wu, M., Turpin, A., Zobel J. 2008. An investigation on a community''s web search variability. Proceedings of the thirty-first Australasian conference on Computer science, pp. 117 - 126.
[26]Yin, X. and J. Han. May 2003. CPAR: Classification based on Predictive Association Rules. Proceedings of the third SIAM International Conference on Data Mining, pp. 331 - 335.
[27]Zaiane, O. R., and Antonie, M. L. 2002. Text Document Categorization by Term Association. Proceeding of the IEEE International Conference on Data Mining, pp. 19.
[28]Zamir, O. and Etzioni, O. 1998. Web Document Clustering: A Feasibility Demonstration. International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 46-54.
[29]Zona Research. 2001. The Need for Speed II. Zona Market Bulletin.
電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top