跳到主要內容

臺灣博碩士論文加值系統

(3.236.50.201) 您好!臺灣時間:2021/08/02 01:45
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳哲真
研究生(外文):Zhe-Zhen Chen
論文名稱:使用問句隱含語意特徵改善網站中的網頁搜尋
論文名稱(外文):Finding Hidden Semantic Features in Questions to Improve Page Searching in Websites
指導教授:盧文祥盧文祥引用關係
指導教授(外文):Wen-Hsiang Lu
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊工程學系碩博士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:67
中文關鍵詞:答案萃取問題分析自然語言處理問答系統
外文關鍵詞:Question AnalysisNatural Language ProcessingQuestion Answering SystemAnswer Extraction
相關次數:
  • 被引用被引用:0
  • 點閱點閱:233
  • 評分評分:
  • 下載下載:64
  • 收藏至我的研究室書目清單書目收藏:0
在單一的網站中,存在網頁與網頁之間的連結及階層關係,這樣的關係不適用於以單一網頁為索引單位的搜尋引擎。在一般自然語言問句的處理上,傳統搜尋引擎也沒有辦法有效地回應使用者的問題。根據目前一些熱門的問答系統研究,我們觀察到許多問句中可能隱含entity、feature、sub-feature的語意結構,其中有一些sub-feature不具有明確的意義,或問句中並沒有sub-feature。針對以上問題,本論文首先將單一網站以所有節點與連結結構來當作索引。對於自然語言問句的處理,我們會先找出使用者問句中的entity、feature、sub-feature,並針對問句中不具明確意義的sub-feature,或沒有sub-feature的情形,利用問答資料庫(Q&A archive)中的問句部分(question part)與答案部分(answer part)來訓練詞對詞之間的翻譯機率(translation probability),以找出對應sub-feature的隱含語意特徵(hidden semantic features)來代表問句可能的隱含目的。藉由entity、feature、sub-feature、hidden semantic features找出相關的網頁,這些相關的網頁可能構成和問句相似的結構,因此我們接著找出這些網頁的網頁連結路徑(page link path),這些網頁連結路徑與使用者問句的結構匹配之後,很可能就是使用者想得到的答案。最後本論文提出一網頁連結路徑搜尋模型(Page-link-path Searching Model),引入問句中的entity、feature、sub-feature關係,計算每ㄧ個候選的網頁連結路徑與使用者問句搭配的機率,機率值愈高的網頁連結路徑愈符合我們想要的搜尋結果。根據我們實驗的結果,利用本論文所提出的方法找出匹配自然語言問句的搜尋結果之正確率,會比以單一網頁為索引單位的方法還來得高。
The order and hierarchical structure relationship between web pages in a single website can not apply to traditional search engines. The traditional search engines take single page as the unit and have no effective solutions to correctly return the result in natural language question search. According to existing question-and-answer systems, we had observed a number of questions that may have hidden semantic structure consists of entity, feature, and sub-feature, in which some sub-features do not have a clear meanings, or questions have no sub-features. To solve above the problems, we try to index all pages and link structure in a single website. For natural language processing, we first identify entity, feature, and sub-feature in questions, and then identify the type of sub-feature. If the sub-feature with unclear meaning, or a question with no sub-feature, then we utilize the question part and the answer part of Q & A archives to train the question word-to-answer word translation probabilities. Also, we extract the hidden semantic features corresponding to sub-feature by these translation probabilities. We find relevant web pages according to entity, feature, sub-feature, and hidden semantic features in a question. Interestingly, the link semantic structures between these relevant web pages seem to be similar to the semantic structure of original natural language questions. Thus, we can utilize the link structures between web pages to construct page link paths (PLP) which map questions’ semantic structure. Finally, we propose a Page-link-path Searching Model which takes advantage of entity, feature, and sub-feature in a question to calculate the probability of each page link path candidate corresponding to the question. A path link path with the higher probability can be regarded as the answer of the user question. According to our experimental results, our proposed method which searches web pages based on multi-level link structure of web site and performs higher accuracy than existing search engines which search pages based on single-page indexing. Furthermore, compared with other methods which handle the implicit structure of natural language questions, our method significantly outperforms these methods.
摘要..............................................................................................................................III
Abstract.......................................................................................................................V
誌謝............................................................................................................................VII
章節目錄..................................................................................................................VIII
圖目錄..........................................................................................................................X
表目錄.......................................................................................................................XII
第一章 序論............................................................................................................- 1 -
1.1 研究動機與問題......................................................................................- 1 -
1.2 研究方法..................................................................................................- 5 -
1.3 論文架構..................................................................................................- 7 -
第二章 文獻回顧與相關研究................................................................................- 8 -
2.1 Implicit Link........................................................................................- 8 -
2.2 自然語言搜尋系統..................................................................................- 8 -
2.3 學習網站結構以改進自然語言搜尋......................................................- 9 -
2.4 Specificity..........................................................................................- 10 -
2.5 問句與答案之間的相關字....................................................................- 10 -
第三章 使用者問句分析與網頁連結路徑搜尋..................................................- 12 -
3.1 研究問題................................................................................................- 12 -
3.2 系統架構................................................................................................- 13 -
3.3 問句分析................................................................................................- 14 -
3.3.1 確認問句中的Entity................................................................- 14 -
3.3.2 確認問句中的feature與sub-feature....................................- 17 -
3.3.3 識別sub-feature的類型..........................................................- 19 -
3.4 使用者問句的隱含目的........................................................................- 23 -
3.5 搜尋網頁連結路徑(Page Link Path Searching)........................- 27 -
3.5.1 找出候選的page link paths..................................................- 27 -
3.5.2 排序後選的page link paths..................................................- 28 -
3.5.3 網頁結點與問句語意結構的相關性........................................- 31 -
3.5.4 相關網頁匹配模型....................................................................- 32 -
第四章 實驗和評估..............................................................................................- 39 -
4.1 實驗資料................................................................................................- 39 -
4.2 評估準則................................................................................................- 40 -
4.3 問句分析效能評估................................................................................- 42 -
4.3.1 Entity之辨識評估....................................................................- 42 -
4.3.2 Feature與Sub-Feature之辨識評估........................................- 44 -
4.3.3 Sub-Feature類型的識別結果..................................................- 46 -
4.4 網頁連結路徑搜尋模型之參數調校....................................................- 48 -
4.5 網頁連結路徑搜尋之效能評估............................................................- 50 -
4.5.1 實驗方法....................................................................................- 50 -
4.5.2 實驗結果與錯誤分析................................................................- 52 -
第五章 結論與未來研究方向..............................................................................- 62 -
5.1 結論........................................................................................................- 62 -
5.2 未來研究工作及方向............................................................................- 62 -
參考文獻................................................................................................................- 65 -
A. Fujii. Modeling Anchor Text and Classifying Queries to Enhance Web Document Retrieval. In Proc. of WWW 2008.

A. Broder. A taxonomy of web search. ACM SIGIR Forum, 36(2):3–10, 2002.

A. Badia and M. Kantardzic. Graph Building as a Mining Activity: Finding Links in the Small. In Proc. of LinkKDD'05.

C. C. Lin and W. H. Lu. Improve Natural Language Question Search Using Page Link Path. The Thesis of Master, 2008.

D. Fernandes, E. S. D. Moura, B. Ribeiro-Neto, A. S. D. Silva and M. A. Gonçalves. Computing Block Importance for Searching on Web Sites. In Proc. of CIKM'07.

D. R. Radev, H. Qil, Z. Zhengl, S. Blair-Goldensohrt, Z. Zhangl, W. Fan and J. Prager. Mining the Web for Answers to Natural Language Questions. In Proc. of CIKM'01.

D. R. Radev, H. Qi, H. Wu and W. Fan. Evaluating Web-based Question Answering Systems.

G. Li, B. C. Ooi, J. Feng J. Wang and L. Zhou. EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data. In Proc. of SIGMOD'08.

G. R. Xue, Q. Yang, H. J. Zeng, Y. Yu and Z. Chen. Exploiting the Hierarchical Structure for Link Analysis. In Proc. of SIGIR'05.

G. R. Xue, H. J. Zeng, Z. Chen, W. Y. Ma, H. J. Zhang, and C. J. Lu. Implicit Link Analysis for Small Web Search. In Proc. of SIGIR'03.

G. Neumann and F. Xu. Mining Answers in German Web Pages. In Proc. of WI'03.

G. Xu and W. Y. Ma. Building Implicit Links from Content for Forum Search. In Proc. of SIGIR'06.
H. Li and N. Abe. Generalizing Case Frames Using a Thesaurus and the MDL Principle. Computational Linguistics, 24(2), pp.217-244, 1998.

H. Duan, Y. Cao, C. Y. Lin and Y. Yu. Searching Questions by Identifying Question Topic and Question Focus. In Proc. of ACL'08:HLT, pages 156-164.

J. Chu-Carroll, J. Prager, Y. Ravin and C. Cesar. A Hybrid Approach to Natural Language Web Search. Language Processing (EMNLP), Philadelphia, pp. 180-187, July 2002.

J. Jeon and W. B. Croft. Learning Translation-based Language Models Using Q&A Archives. Technical Report, University of Massachusetts.

J. Jeon, W. B. Croft and J. Lee. Finding Similar Questions in Large Question and Answer Archives. In Proc. of CIKM'05.

J. Jeon, W. B. Croft and J. Lee. Finding Semantically Similar Questions Based on Their Answers. In Proc. of SIGIR'05.

K. H. Lin and W. H. Lu. Learning Question Structure Based on Website Link Structure to Improve Natural Language Search. The Thesis of Master.

M. Surdeanu, M. Ciaramita and H. Zaragoza. Learning to Rank Answers on Large Online QA Collections. In Proc. of ACL'08:HLT, pages 719-727.

N. Kawamae, H. Suzuki and O. Mizuno. Query and Content Suggestion Based on Latent Interest and Topic Class. In Proc. of WWW'04.

P. F. Brown, V. J. D. Pietra, S. A. D. Pietra, and R. L. Mercer. The Mathematics of Statistical Machine Translation: Paramter Estimation. Computational Linguistics, 19(2):263–311, 1993.

R. Jones, B. Rey, O. Madani and W. Greiner. Generating Query Substitutions. In Proc. of WWW'06.

S. Morinaga, H. Arimura and T. Ikeda. Key Semantics Extraction by Dependency Tree Mining. In Proc. of KDD'05.

S. Narayanan and S. Harabagiu. Question Answering Based on Semantic
Structures. Proceedings of the 20th international conference on Computational Linguistics, 2004.

X. Xue, J. Jeon and W. B. Croft. Retrieval Models for Question and Answer Archives. In Proc. of SIGIR 2008:475-482.

Y. Cao, H. Duan, C. Y. Lin, Y. Yu and H. W. Hon. Recommending Questions Using the MDL-based Tree Cut Model. In Proc. of WWW 2008:81-90.

Y. Cao and H. Li. Base Noun Phrase Translation Using Web Data and the EM Algorithm. In Proc. of COLING'02.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top