(3.235.191.87) 您好!臺灣時間:2021/05/13 04:26
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:陳世穎
研究生(外文):Shih-Ying Chen
論文名稱:使用類別資訊產生作為查詢詞概觀的網站片段敘述
論文名稱(外文):Generating Snippets as Query Overview with CategoryInformation
指導教授:鄭卜壬鄭卜壬引用關係
指導教授(外文):Pu-Jen Cheng
口試委員:陳信希盧文祥張嘉惠
口試委員(外文):HSIN-HSI CHENWEN-HSIANG LUChia-Hui Chang
口試日期:2013-07-08
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2013
畢業學年度:101
語文別:英文
論文頁數:35
中文關鍵詞:搜尋結果概要
外文關鍵詞:Search-Result Summarization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:111
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在這篇論文中, 我們探討一個如何改善搜尋結果頁面
所有結果底下的片段資訊以作為搜尋詞的整體概觀的問
題, 使使用者在點擊搜尋結果前可已對這個搜尋詞得到一
個大概的了解. 對於一個已知其類別的搜尋詞, 我們使用
了其類別所涵蓋的語意以及那些跟這類別息息相關的屬
性來組織這樣的片段資訊我們對類別從社群問答網站中
的問題抽取了那些跟類別息息相關的屬性, 並從那些問題
的答案中萃取了每個屬性的context information. 而我們
產生這些片段資訊主要依賴三個因素, 涵蓋搜尋詞的資訊
量, 涵蓋類別語意的程度, 以及涵蓋類別屬性的程度, 為了
能同時最佳化這三個因素, 我們採用了整數線性規劃來
模組化我們的問題, 實驗結果顯示我們產生出的片段資
訊, 在與傳統搜尋頁面以及一些基本的summarization 演
算法, 在表達搜尋詞概觀的程度上, 有不少的進步.

Previous work on snippet generation focuses mainly on
how to produce one snippet for an individual search result.
This paper aims to generate a comprehensive overview for
an entity query in the search-result page. We assume each
entity has its own category, whose attributes are regarded as
the unique characteristics that the users might be interested
in when searching for the entity. Given an entity as query
(e.g., enterogastritis) and its category (e.g., disease), we want
to organize the snippets that contain its attributes (e.g., symptoms
and diagnoses) so that users can learn about the useful
information with respect to the given query directly from the
generated snippets without downloading documents. First, we
extract the attributes of a category from a community-based
question-answering (CQA) website. Next, the snippets are
generated according to several factors, including how a sentence
could be central to the meanings of the query, its category
and corresponding attributes, and how well the snippets
diversify the attributes. Finally, an Integer Linear Programming
(ILP) is adopted to find an optimal sentence set as the
snippet. The experiments are conducted on 100 common disease
queries. Experimental results demonstrate the effectiveness
and efficiency of the proposed approach, compared to an
existing search engine and several summarization baselines.

口試委員會審定書.....................iii
誌謝.....................v
摘要.....................vii
Abstract.....................ix
1 Introduction.....................1
2 Related Works.....................7
2.1 Document Summarization.....................7
2.2 Facet Generation and Summarization.....................8
3 Problem Formulation.....................9
3.1 Problem Specification.....................9
4 System Overview.....................11
5 Off-line Information Extraction Stage.....................13
5.1 Attribute Extraction.....................13
5.2 Building Category Vector..................... 16
6 On-line Query Processing Stage.....................17
6.1 Scoring Function S_important(s).....................17
6.1.1 Scoring function S_query(s).....................17
6.1.2 Scoring function S_category(s).....................18
6.1.3 Scoring function S_attr(s).....................18
6.2 Merging Scoring Functions.....................19
6.3 Integer Linear Programming Model.....................19
6.4 Sentence Compression..................... 20
7 Experiments.....................21
7.1 Data Set.....................21
7.2 Baselines.....................21
7.3 Evaluation.....................23
7.4 Performance Comparison.....................23
7.4.1 ROUGE Performance .....................23
7.4.2 Time Performance.....................26
7.5 Parameter Settings.....................27
7.5.1 The impact of α,β,γ in S_important . . . . . . . 27
7.5.2 Number of used attributes.....................27
7.5.3 Limit on number of attribute presence.....................28
8 Conclusions and Future Work.....................31
Bibliography.....................33

[1] Introducing the knowledge graph: things, not
strings. http://googleblog.blogspot.tw/2012/05/
introducing-knowledge-graph-things-not.html.
[2] D. E. Avison and H. U. H. U. Shah. The information systems development
life cycle Texte imrpime : a first course in information
systems. McGraw-Hill Companies, 1997.
[3] M. Bautin and S. Skiena. Concordance-based entity-oriented
search. Web Intelli. and Agent Sys., 7(4), Dec. 2009.
[4] O. Ben-Yitzhak, N. Golbandi, N. Har’El, R. Lempel, A. Neumann,
S. Ofek-Koifman, D. Sheinwald, E. Shekita, B. Sznajder,
and S. Yogev. Beyond basic faceted search. In Proc. of WSDM,
2008.
[5] A. L. Berger and V. O. Mittal. Ocelot: a system for summarizing
web pages. In Proc. of SIGIR, 2000.
[6] P. Bhaskar and S. Bandyopadhyay. A query focused multi document
automatic summarization. In Proc. of PACLIC, 2010.
[7] W. Dakka and P. G. Ipeirotis. Automatic extraction of useful facet
hierarchies from text databases. In Proc. of ICDE, 2008.
[8] Z. Dou, S. Hu, Y. Luo, R. Song, and J.-R. Wen. Finding dimensions
for queries. In Proc. of CIKM, 2011.
[9] G. Erkan and D. R. Radev. Lexrank: graph-based lexical centrality
as salience in text summarization. J. Artif. Int. Res., 22(1), Dec.
2004.
[10] D. Gillick and B. Favre. A scalable global model for summarization.
In Proc. of ILP Workshop, 2009.
[11] M. A. Hearst. Clustering versus faceted categories for information
exploration. Commun. ACM, 49(4), 2006.
[12] Y. Ko, H. An, and J. Seo. Pseudo-relevance feedback and statistical
query expansion for web snippet generation. Inf. Process. Lett.,
109(1), 2008.
[13] J. Kupiec, J. Pedersen, and F. Chen. A trainable document summarizer.
In Proc. of SIGIR, 1995.
[14] C.-Y. Lin. Rouge: A package for automatic evaluation of summaries.
In Proc. of ACL workshop, 2004.
[15] X. Ling, Q. Mei, C. Zhai, and B. Schatz. Mining multi-faceted
overviews of arbitrary topics in a text collection. In Proc. of
SIGKDD, 2008.
[16] D. M. McDonald and H. Chen. Summary in context: Searching
versus browsing. ACM Trans. Inf. Syst., 24(1), Jan. 2006.
[17] R. McDonald. A study of global inference algorithms in multidocument
summarization. In Proc. of ECIR, 2007.
[18] J.-P. Ng, P. Bysani, Z. Lin, M.-Y. Kan, and C.-L. Tan. Exploiting
category-specific information for multi-document summarization.
In Proc. of COLING, 2012.
[19] D. R. Radev, W. Fan, and Z. Zhang. Webinessence: a personalized
web-based multi-document summarization and recommendation
system. In In NAACL 2001 Workshop on Automatic Summarization,
2001.
[20] D. R. Radev, H. Jing, M. Styś, and D. Tam. Centroid-based summarization
of multiple documents. Inf. Process. Manage., 40(6),
Nov. 2004.
[21] C. Sauper and R. Barzilay. Automatically generating wikipedia
articles: a structure-aware approach. In Proc. of ACL and IJCLP
of the AFNLP, 2009.
[22] W. Song, Q. Yu, Z. Xu, T. Liu, S. Li, and J.-R. Wen. Multi-aspect
query summarization by composite query. In Proc. of SIGIR, 2012.
[23] E. Stoica, M. A. Hearst, and M. Richardson. Automating creation
of hierarchical faceted metadata structures. In Proc. of NAACL
HLT, 2007.
[24] A. Tombros and M. Sanderson. Advantages of query biased summaries
in information retrieval. In Proc. of SIGIR, 1998.
[25] R. Varadarajan. A system for query-specific document summarization.
In Proc. of CIKM, 2006.
[26] X. Wang and C. Zhai. Learn from web search logs to organize
search results. In Proc. of SIGIR, 2007.
[27] M. White, T. Korelsky, C. Cardie, V. Ng, D. Pierce, and
K. Wagstaff. Multidocument summarization via information extraction.
In Proc. of HLT, 2001.
[28] H.-J. Zeng, Q.-C. He, Z. Chen, W.-Y. Ma, and J. Ma. Learning to
cluster web search results. In Proc. of SIGIR, 2004

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔