(35.175.212.130) 您好!臺灣時間:2021/05/18 04:27
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:李淳如
研究生(外文):Chun-Ju Li
論文名稱:運用語意相關詞來推估Google搜尋引擎的排名
論文名稱(外文):Approaching Google Ranking with Semantically Related Terms
指導教授:陸承志陸承志引用關係
指導教授(外文):Cheng-Jye Luh
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:中文
論文頁數:47
中文關鍵詞:搜尋引擎優化搜尋引擎排名因素語意相關詞網頁搜尋潛在語意分析隱含狄氏配置
外文關鍵詞:Search Engine OptimizationRanking FactorsSemantically Related TermsWeb Page SearchLatent Semantic AnalysisLatent Dirichlet Allocation
相關次數:
  • 被引用被引用:0
  • 點閱點閱:296
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
本研究旨在利用和關鍵詞的語意相關詞的線性組合是否逼近Google搜尋引擎排名。本研究著重的重點為網頁的隱含語意,以及關鍵字在網頁標題、網頁片段以及網址所出現的方式,而非所有的排名因素。我們將Google的搜尋結果網頁擷取出網頁標題、網頁片段以及網址,並進行n-gram斷詞,然後使用潛在語意分析 (Latent Semantic Analysis) 與Latent Dirichlet Allocation兩種方法來找出網頁中與查詢關鍵詞有語意相關的詞,並且計算關鍵字在搜尋結果網頁標題、網頁片段以及網址的權重,並將這三種線性組合成一個搜尋結果網頁的分數。我們以語意相關詞數量、網頁文件數量、uni-gram與n-gram語意相關詞以及一個主題與兩個主題的語意相關詞所組成的八組參數組合來進行實驗。實驗結果顯示,語意相關詞的數量為20個以及網頁文件數量為20筆的排序結果最好,在所有參數組合中結果最好的R-Precision可以到達0.8,顯示本研究的方法產生的新排序結果相當接近Google的原始排序結果。

This study aims to approximate Google ranking results using semantically related terms of query. Firstly, we crawled and extracted web page title, snippet and URL from Google search results. Then we found semantically related terms using Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) two approaches. Secondly we calculated the scores for keywords in title, keyword in snippet and keyword in URL for obtaining a document score. Several experiments were conducted on different combination of number of semantically related terms, number of documents, uni-gram and n-gram tokenization method, 1 topic and 2 topics of semantically related terms. The experimental results showed the average R-Precision reaches 0.8, indicating the ranking results of the proposed method approximates to Google results.

書名頁 i
論文口試委員審定書 ii
授權書 iii
中文摘要 iv
英文摘要 v
誌謝 vi
目錄 vii
表目錄 ix
圖目錄 x

第一章、動機與目的 1
1.1. 研究動機 1
1.2. 研究目的 2
1.3. 研究架構 2
第二章、文獻探討 3
2.1. 搜尋引擎最佳化 (Search Engine Optimization, SEO) 3
2.2. 潛在語意分析 (Latent Semantic Analysis) 4
2.3. Latent Dirichlet Allocation (LDA) 6
2.4. Document Scoring 8
第三章、研究方法 11
3.1. 系統架構 11
3.2. 資料前處理 12
3.2.1. 資料剖析 12
3.2.2. N-Gram Extraction 12
3.3. 語意相關詞擷取 13
3.3.1. Latent Semantic Analysis(LSA) 14
3.3.2. Latent Dirichlet Allocation(LDA) 15
3.4. 文件權重計算與排名 16
3.4.1. 語意相關詞權重計算方式 17
3.4.2. 文件標題權重 21
3.4.3. 文件網址權重 24
3.4.4. 文件排序 29
第四章、實驗評估 30
4.1. 實驗資料集 30
4.2. 實驗設定 31
4.3. 語意相關詞數量探討 32
4.4. 文件數量探討 34
4.5. 綜合實驗 35
4.6. 參數組合排名差異結果探討 39
第五章、結論與未來展望 43
5.1. 結論 43
5.2. 未來展望 44
參考文獻 45


[1]Biro, I., Benczur, A., Szabo, J. and Maguitman, A. 2008.A Comparative Analysis of Latent Variable Models for Web Page Classification. Latin American Web Coference. pp. 23-28.
[2]Bíró, I., Szabó, J. and Benczúr, A. A. 2008. Latent Dirichlet Allocation in Web Spam Filtering. Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web, pp. 29-31.
[3]Blei, D. M., Ng, A. Y. and Jordan, M. I. 2003. Latent Dirichlet allocation. Journal of Machine Learning Research, pp. 993-1022.
[4]Brooks, N. 2007. The Atlas Rank Report: How Search Engine Rank Impacts Traffic. http://www.atlassolutions.co.uk/uploadedFiles/Atlas/Atlas_Institute/Published_Content/RankReport.pdf (accessed at Jul. 8, 2011)
[5]Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W. and Harshman, R.A. 1990. Indexing by latent semantic analysis. Journal of the American Society of Information Science, (41:6), pp. 391-407.
[6]Delphi Group. 2002. Taxonomy and Content Classification. http://www.delphigroup.com/whitepapers/pdf/WP_2002_TAXONOMY.PDF (accessed at Dec. 8, 2010)
[7]Evans, M. P. 2007. Analysing Google rankings through search engine optimization data. Internet Research, (17:1), pp. 21-37.
[8]Foltz, P. W. 1996. Latent Semantic Analysis for Text-based Research. Behavior Research Methods, Instruments, & Computers. (28:2), pp.197-202.
[9]Hofmann, T. 1999. Probabilistic Latent Semantic Indexing. Proceeding of the Fifteenth Annual Conference on Uncertainty in Artificial Intelligence, pp.289-296.
[10]Internet World Stats. 2010. Internet Usage Statistics—The Big Picture. http://www.internetworldstats.com/stats.htm. (accessed at Dec. 8, 2010)
[11]Kakkonen, T., Myller, N., Sutinen, E., and Timonen, J. 2008. Comparison of Dimension Reduction Methods for Automated Essay Grading. Educational Technology & Society, (11:3), pp.275–288.
[12]Krestel, R., Fankhauser, P. and Nejdl, W. 2009. Latent Dirichlet Allocation for Tag Recommendation. In Proceedings of the third ACM conference on Recommender systems. pp.61-68.
[13]Kules, B., and Shneiderman, B. 2008. Users can change their web search tactics: Design guidelines for categorized overviews. Information Processing & Management. (44:2), pp.463-484.
[14]Manning, C. D., Prabhakar, R., and Hinrich, S. 2009. An Introduction to Information Retrieval. Cambridge, England: Cambridge University Press.
[15]McGaw, J. 2009. Search Engine Optimization. http://www.springerlink.com/content/k5x797x36222m808/ (accessed at July 1, 2011)
[16]Oudinet, J. 2007. Search Engine Ranking. http://www.lrde.epita.fr/dload/20060524-Seminar/oudinet-search-engine-ranking.pdf (accessed at July 3, 2011)
[17]Page, L., Brin, S., Motwani, R., and Winograd, T. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.
[18]Patterson, A. L. 2009. Phrase-based indexing in an information retrieval system. U.S. Patent 7,536,408 B2, issued May 19.
[19]Patterson, A. L. 2009. Phrase-based Searching in an information retrieval system. U.S. Patent 7,599,914 B2, issued Oct 6.
[20]Porter, M.F. 1980. An algorithm for suffix stripping. Program: electronic library and information systems. (14:3), pp.130-7.
[21]Salton, G., Yang, C. and Yu, C. 1975. A theory of term importance in automatic text analysis. Journal of the American Society for Information Science, 36:33–44.
[22]Salton, G. and Buckley, C. 1988. Term weighting approaches in automatic text retrieval. Information Processing and Management, 24(5):513–523.
[23]Satanjeev, B. and Pedersen, T. 2003. The Design, Implementation and Use of the Ngram Statistics Package. Lecture Notes in Computer Science. pp. 370–381.
[24]Skiera. B., Eckert, J. and Hinz, O. 2010. An Analysis of the Importance of the Long Tail in Search Engine Marketing. Electronic Commerce Research and Applications, (9:6), pp. 488-494.
[25]Zhang, W., Yoshida, T. and Tang, X. 2011. A comparative study of TF*IDF, LSI and multi-words for text classification. Expert Systems with Applications. (38:3), pp. 2758-2765.


電子全文 電子全文(本篇電子全文限研究生所屬學校校內系統及IP範圍內開放)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top