跳到主要內容

臺灣博碩士論文加值系統

(44.200.169.3) 您好!臺灣時間:2022/12/04 10:41
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:李敬江
研究生(外文):Jing-Jiang Li
論文名稱:以查詢改寫提供衍生詞彙之機制
論文名稱(外文):Using Rewriting Query to Find Extended Keywords
指導教授:林志麟林志麟引用關係
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2006
畢業學年度:94
語文別:中文
論文頁數:44
中文關鍵詞:查詢改寫查詢擴展衍生關鍵字相關詞彙Pseudo Relevance Feedback
外文關鍵詞:Query RewriteQuery ExpansionDerived KeywordRelevant TermsPseudo Relevance Feedback
相關次數:
  • 被引用被引用:0
  • 點閱點閱:293
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
由於搜尋引擎的普遍使用與網際網路的快速發展,使得網站經營者透過不同的技術以提高自己的網頁在搜尋引擎中的排名;其中關鍵字技術由於實作容易因此較常被採用。現今出現許多提供關鍵字的軟體、網站,但是其僅能提供包含查詢關鍵字的衍生詞彙,且有些軟體因侷限於詞庫,而無法提供較新穎的詞彙。因應上述問題,本研究試圖設計一套系統,於前端使用者部分提供網址及關鍵字輸入兩種模式,而後端衍生詞彙之機制則透過網際網路龐大的資料量,以Pseudo Relevance Feedback方式為基礎,並結合Google API回傳前N筆具有相關性的網頁,根據修正後的Entropy Weighting公式進行詞彙權重的計算與分析,直到滿足所回傳的衍生詞彙之數量為止。最後經由實驗取得各項參數值,針對不同的查詢類別進行準確度的實驗分析,比較不同軟體間的重疊度與提出相關網頁個數評估法進行詞彙之相關性的評估。
As more people are using search engines to find what they need from the Internet, businesses must keep their search engine ranking high to remain competitive. One commonly used and easily deployed way to keep a webpage’s ranking high is to put the right keywords in the page. Determining a right set of keywords, however, is not a trivial problem. Currently, there are many software products and websites that can base on an initial keyword set to derive an expanded keyword set, but the variety of keywords in the expanded keyword set is quite limited in that an expanded keyword must always contain some initial keyword, and that keywords are retrieved only from a fixed database of terms and phrases.To increase the variety of expanded keywords, this research proposes a method to expand keywords without the limitations mentioned above. The initial set of keywords can be explicitly specified or via a webpage, from which an initial set of keywords will be derived. Then, an expanded set of keywords is built by first querying Google to retrieve the top n relevant pages, and then using Pseudo Relevance Feedback and a modified Entropy Weighting formula to analyze the weighting of phrases and terms. We experiment this method on several different categories of initial keywords to fine tune the appropriate threshold values. Finally, we study the overlap of the expanded keywords generated by various software products, and propose a method to evaluate the relevancy of keywords based on the number of web pages returned by Google.
目錄

書名頁 i
中文摘要 ii
英文摘要 iii
誌謝 iv
目錄 v
表目錄 vi
圖目錄 vii
第一章、緒論 1
1.1研究背景 1
1.2研究動機 1
1.3研究目的 2
1.4論文架構 2
第二章、相關文獻 4
2.1相關詞的回饋機制 4
2.2關鍵字的擷取 5
2.3詞彙權重之計算 5
2.4提供關鍵字之應用 8
第三章、系統架構 12
3.1前端使用者輸入部分 13
3.2後端PRF衍生詞彙之機制 14
3.3系統限制 18
第四章、實驗測試 19
4.1實驗設計一 19
4.2實驗設計二 24
4.3實驗設計三 28
4.4實驗設計四 30
4.5實驗設計五 31
4.6實驗設計六 36
第五章、結論與未來研究 38
參考文獻 40
附錄一 43
A.本實驗所測試之查詢 43
B.系統介面 44

表目錄
表2-1 本系統與其他軟體網站之比較 11
表4-1 實驗平台 19
表4-2 實驗設計一:針對1-GRAM TERM進行實驗分析 20
表4-3 實驗設計二:針對2-GRAM TERM進行實驗分析 24
表4-4 實驗設計五 31
表4-5 不同軟體、網站的衍生詞彙之重疊度分析 34

圖目錄
圖2-1 TRELLIAN KEYWORD ANALYZER軟體 9
圖2-2 TRELLIAN SEARCHTERM ADVISOR軟體 9
圖2-3 GOOGLE ADWORDS&KEYWORDS軟體 10
圖2-4 ADWORDS GOOGLE網站 10
圖3-1 系統架構 13
圖3-2 分割為候選詞句 15
圖3-3 步驟3 詞句擷取 16
圖3-4 STEP 5 EXPANSION TERM SELECTION 17
圖4-1 微調參數值對於四種模式之實驗結果(KEYWORD:JAVA) 21
圖4-2 微調參數值對於四種模式之實驗結果(KEYWORD:ORACLE) 22
圖4-3 微調參數值對於四種模式之實驗結果(KEYWORD:IPOD) 22
圖4-4 不同擷取模式的實驗結果(1-GRAM TERM)23
圖4-5 回傳不同詞彙門檻值的實驗結果(1-GRAM TERM)24
圖4-6 微調參數值對於四種模式之實驗結果(KEYWORD:DATA MINING) 25
圖4-7 微調參數值對於四種模式之實驗結果(KEYWORD:DIGITAL CAMERA) 26
圖4-8 微調參數值對於四種模式之實驗結果(KEYWORD:MP3 PLAYER) 26
圖4-9 不同擷取模式的實驗結果(2-GRAM TERM)27
圖4-10 回傳不同詞彙門檻值的實驗結果(2-GRAM TERM)28
圖4-11 三種組合模式之網站實驗結果(HTTP://JAVA.SUN.COM)29
圖4-12 三種組合模式之網站實驗結果(HTTP://OTN.ORACLE.COM)29
圖4-13 三種組合模式之網站實驗結果(HTTP://WWW.MACWORLD.COM)30
圖4-14 不同類型的衍生詞彙之準確率 31
圖4-15 33個詞彙以相關網頁個數評估法之分析結果(KEYWORD:IPOD)33
圖4-16 TRELLIAN SEARCHTERM軟體之相關網頁個數評估法 36
圖4-17 本系統之相關網頁個數評估法(扣除重疊詞彙)36
圖4-18 系統執行的時間 37
[1]C. Buckley,“Automatic query expansion using SMART: Trec-3”,Third Text Retrieval Conference (TREC-3),1995,pp. 69-80.
[2]C. Buckley,G. Salton and J. Allan,“Automatic Retrieval With Locality Information Using SMART”,Text Retrieval Conference (TREC-1),1992,pp. 500-207.
[3]G. Salton and C. Buckley,“Term Weighting Approaches in Automatic Text Retrieval”,Information Processing and Management:an International Journal,1988,vol. 24(5),pp. 513-523.
[4]H. Paijmans,”Comparing the Document Representation of Two IR Systems: CLARIT and TOPIC”,Journal of American Society for Information Science,1993,vol. 44(7),pp. 383-392.
[5]J. L. Fagan,”The Effectiveness of a Nonsyntactic Approach to Automatic Phrase Indexing for Document Retrieval”,Journal of American Society for Information Science,1989,vol. 40(2),pp. 115-132.
[6]K. Aas and L. Eikvil,”Text categorisation:A survey”,Technical report,Norwegian Computing Center,1999,http://citeseer.nj.nec.com/aas99text.html
[7]K.F. Jea and P.Y. Hsu,”An Entropy-based Ranking Method for Internet Search”, In Proceedings of 2000 Workshop on Internet & Distributed Systems,2000,pp. 115-123.
[8]L. P. Jones,E. W. Gassie and S. Radhakrishnan,”INDEX:The Statistical Basis for an Automatic Conceptual Phrase-indexing System”,Journal of American Society for Information Science,1990,vol. 41(2),pp. 87-98.
[9]R. Burgin and M. Dillon,"Improving Disambiguation in FASIT",Journal of American Society for Information Science,1992,vol. 43(2),pp. 101-114.
[10]S. Deerwester,ST. Dumai,GW. Furnas,TK. Landauer and R. Harshman,“Indexing by Latent Semantic Analysis”,Journal of the American Society of Information Science,1999,vol. 41(6),pp. 391 – 407.
[11]Susan Dumais,”Improving the retrieval of information from external sources”, Behavior Research Methods,Instruments and Computers,1991,vol. 23,no. 2, pp. 229-236.
[12]Jinxi Xu,“Query expansion using local and global document analysis”,Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval,1996,pp. 4-11.
[13]Y. Qiu and H. Frei,“Concept based query expansion”,Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval table of contents,1993,pp. 160-169.
[14]Zimin Wu and Gwyneth Tseng,”ACTS: An Automatic Chinese Text Segmentation System for Full Text Retrieval”,Journal of American Society for Information Science,1995,vol. 46(2),pp. 83-96.
[15]C.J. Van Rijsbergen,”Information Retrieval”,Butterworth-Heinemann Newton,second edition,1979.
[16]”Introduction to modern information retrieval”,McGraw-Hill,1983.
[17]Google Inc,“Google Web APIs Reference”,2002
[18]“Mining the Web:Analysis of Hypertext and Semi Structured Data”,Chapter 3 Web Searching and Information Retrieval,Morgan-Kaufmann Publishers,2002.
[19]曾元顯,”關鍵詞自動擷取技術與相關詞回饋”,中國圖書館學會會報59期,1997/11/4
[20]曾元顯,”關鍵詞自動擷取技術之探討”,中國圖書館學會會訊106期,1997/9/1
[21]SEO優化王,”搜尋引擎最佳化的說明”,http://www.2helpyoursite.com/
[22]創藝網際,”淺談搜尋引擎行銷”,http://www.creartive.com.tw/articles/website_search_engine.shtml
[23]黃彥達,”第三代網路行銷:搜尋引擎行銷(上)”,http://www.digitalwall.com/scripts/display.asp?UID=288 , 2005/04/24
[24]黃彥達,”第三代網路行銷:搜尋引擎行銷(中)”, http://www.digitalwall.com/scripts/display.asp?UID=288 , 2005/04/24
[25]黃彥達,”第三代網路行銷:搜尋引擎行銷(下)”,http://www.digitalwall.com/scripts/display.asp?UID=288 , 2005/04/24
[26]7searchhttp://conversion.7search.com/scripts/advertisertools/keywordsuggestion.aspx
[27]Alan K''necht,”SEO and Your Web Site”, http://www.digital-web.com/articles/seo_and_your_web_site/, August 4, 2004
[28]Creating Keyword Rich Pages http://www.dreamweaverresources.com/seo/keywordpages.htm
[29]Digitalpointhttp://www.digitalpoint.com/tools/suggestion/
[30]DVL/Verity Stop Word Listhttp://dvl.dtic.mil/stop_list.html
[31]Googlewww.google.com
[32]Google Adwords Keywordhttps://adwords.google.com/
[33]Google KeywordSandboxhttps://adwords.google.com/select/KeywordSandbox
[34]iProspect,”iProspect Search Engine User Attitudes”,www.iProspect.com,2004.
[35]Keyword Surveillance Tool: Stop Word List http://www.searchengineworld.com/spy/stopwords.htm
[36]Overturehttp://inventory.overture.com/d/searchinventory/suggestion/
[37]Trellian Keyword Analyzerhttp://www.trellian.com/keyword/
[38]Trellian SearchTerm Advisorhttp://www.trellian.com/keyword/
[39]Wikipedia, Search engine optimization http://en.wikipedia.org/wiki/Search_engine_optimization
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top