跳到主要內容

臺灣博碩士論文加值系統

(44.192.20.240) 您好!臺灣時間:2024/02/27 12:58
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:謝維哲
研究生(外文):Wei-Che Hsieh
論文名稱:一個分類導向的新聞摘要模型
論文名稱(外文):A Classification Oriented News Summary Model
指導教授:洪智力洪智力引用關係
指導教授(外文):Chih-Li Hung
學位類別:碩士
校院名稱:中原大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2009
畢業學年度:97
語文別:中文
論文頁數:64
中文關鍵詞:資訊檢索自動文件分類K-means分群演算法文件摘要技術
外文關鍵詞:Document ClassificationInformation RetrievalDocument SummarizationK-means
相關次數:
  • 被引用被引用:1
  • 點閱點閱:232
  • 評分評分:
  • 下載下載:2
  • 收藏至我的研究室書目清單書目收藏:1
近幾年來網路的應用越來越廣泛,使得許多資訊陸續進行數位化,以利在網路上傳播。不過隨著數位化的發展,也使得資訊大量的增加,使用者在獲取資訊上已不如以往困難。在這種現象之下,重要的是如何過濾掉不需要的訊息,讓使用者可以找到真正所需要的資訊。

在傳統的新聞摘要中,處理單一新聞文件時,大多使用單文件摘要(Single-Document Summarization)技術將新聞摘要呈現出來。新聞自動分類技術已漸趨成熟,多數的新聞入口網站也會將每一篇新聞分類,卻未針對不同的類別新聞作不同的新聞摘要。如此一來,新聞摘要將有可能使讀者無法快速的搜尋到所關心的新聞,或是遺漏了相關的新聞消息。

所以本研究別於以往,提出一套結合分類導向的新聞摘要方法。希望藉由資訊檢索概念、計算字詞TF*IDF權重值、K-means分群法與文件摘要技術組成新聞摘要,並再針對分類新聞中權重值前百分之十的字詞作為關鍵字詞,進行權重的加權。除此之外,由於新聞文章中的新聞標題字詞往往為新聞重點之一,而文章中的首段首句及末段末句又常為主題句及結論句,因此本研究對於以上重點部分也進行權重的調整。目的就在於希望當讀者閱讀擁有分類導向的新聞摘要之後,便能快速的掌握新聞的重點,判斷是否為其所需要之新聞資訊。
The Internet has been applied more and more widely in recent years. Therefore, various information has been digitized to facilitate its spread on the Internet. With the development of digitization, huge amount of information has been created and it’s not as difficult for users to acquire information as before. As a result, it is important for users to exclude unnecessary information to get what they really need.

Traditionally, the single-document summarization method has been used to present the single news summary. With the development of the news automatic clustering technology, most news portal sites also classify news, but they do not give different summaries to different news types. As a result, readers may not quickly find the news they care about or they may miss relative news by the search of this kind of news summaries.

This research is to come out with a news summary method that combines with the classification-oriented technology. This method creates news summaries by the concept of information retrieval, the calculation of TF*IDF weight of the words, K-means clustering, and document summarization. We then define the top 10% words of the classified news by their weight as relevant words. In addition, since the headlines are usually the key points of the news, and the first sentence of the first paragraph is usually the main point and the last sentence of the last paragraph is usually the conclusion, this research also adjusts the weights of relevant words according to those concepts. The purpose of the research is to provide a method of classification-oriented news summary so that readers can get the main points of the news in a short time and determine whether the news is what they want.
目錄
中文摘要 I
Abstract II
誌謝辭 IV
目錄 V
圖目錄 XIII
表目錄 IX
第一章、緒論 1
1.1研究背景與動機 1
1.2研究問題 2
1.3研究目的 2
1.4論文架構 3
第二章、文獻探討 4
2.1 資訊檢索 4
2.1.1 資訊檢索介紹 4
2.1.2 資訊擷取 5
2.1.3 資訊過濾 6
2.2 字詞處理技術 6
2.2.1 字詞權重計算 6
2.2.2 向量空間模型 9
2.3 K-means分群法 10
2.4 文件摘要技術 12
2.4.1 文件摘要的定義 12
2.4.2 文件摘要的分類 12
2.4.3 文件摘要相關研究 14
2.5 小結 15
第三章、研究方法 16
3.1 模型架構 16
3.2 實驗文件資料前置處理 18
3.2.1英文新聞文件收集 18
3.2.2斷字詞及詞幹轉換 18
3.2.3字詞權重計算 19
3.2.4刪除停用字 20
3.2.5計算出各句子的權重值 20
3.2.6 利用K-means分群演算法將句子分群 22
3.3 將候選句組成新聞摘要 22
3.4 結合分類導向的新聞摘要 23
3.4.1新聞分類關鍵字詞權重加權 24
3.4.2 新聞文件標題字詞權重加權 24
3.4.3 新聞文件中首段首句及末段末句權重加權 24
3.4.4 組成擁有分類導向的新聞摘要 25
第四章、實驗結果分析與評估 26
4.1 自動化新聞摘要結果呈現 26
4.2 模型評估 27
4.2.1 評估項目 28
4.2.2 判讀新聞摘要效益評估資料分析 28
4.2.3 問卷量表的信度檢驗 31
4.2.4 問卷量表的效度檢驗 32
4.2.5 獨立樣本t檢定 33
4.3 實驗分析 36
第五章、結論與未來展望 37
5.1 研究成果 37
5.2 研究討論 38
5.3 未來展望 39
參考文獻 41
附錄A:停用字列表(Stop Word List) 46
附錄B:問卷 48


圖目錄

圖2-1、向量空間模型(Salton & Gill, 1983) 9
圖3-1、本研究模型架構圖 17
圖4-1、受測者認為哪一個新聞摘要最能幫助其判讀新聞內容之比例 29



表目錄

表2-1、Local Term Weighting Formulas 7
表2-2、Global Term Weighting Formulas 8
表3-1、字彙經過stemming之後的變化 19
表3-2、停用字的部分列表 20
表4-1、新聞自動摘要結果與Google搜尋引擎提示句 27
表4-2、選擇「Google提示句」受測者之問卷分析 29
表4-3、選擇「模型一」受測者之問卷分析 30
表4-4、選擇「模型二」受測者之問卷分析 30
表4-5、問卷評估結果相互比較表 31
表4-6、信度分析表 32
表4-7、效度分析表 33
表4-8、比較Google提示句與模型一摘要了解新聞主題之t檢定 33
表4-9、比較Google提示句與模型一摘要長度之t檢定 34
表4-10、比較Google提示句與模型一摘要語句連貫之t檢定 34
表4-11、比較Google提示句與模型二摘要了解新聞主題之t檢定 35
表4-12、比較Google提示句與模型二摘要長度之t檢定 35
表4-13、比較Google提示句與模型二摘要語句連貫之t檢定 35
英文部分:
1.Allen, C., Kania D., and Yaeckel, B. (1998). Internet World Guide to One-To-One Web Marketing, John Wiley & Sons.
2.Anderson, J. C., and Gerbing, D. W. (1988). Structural Equation Modeling in Practice: A Review and Recommended Two-step Approach. The American Psychological Association, 103(3), 411-423.
3.Belkin N.-J. and Croft W.-B. (1992). Information Filtering and Information Retrieval: Two sides of the Same Coin?, Communications of the ACM, pp. 29-38.
4.Bollacker, K., Lawrence S., and Giles, C.-L. (1999). A system for automatic personalized tracking of scientific literature on the web, Digital Libraries The Fourth ACM Conference on Digital Libraries, ACM Press, New York, 105–113.
5.Breese, J. S., Heckerman, D., & Kadie, C. (1998).Empirical Analysis of Predictive Algorithms for Collaborative Filtering. Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence, 43-52.
6.Chen, H., Chung, Y.-M., Ramsey, M., and Yang, C.-C. (1998). An Intelligent personal spider (agent) for dynamic internet/intranet searching, Decision Support Systems, 23(1),41-58.
7.Chen, H. H. and Huang, S. J. (1999). A summarization system for Chinese News from multiple sources. Proceeding of 4th International Workshop on Information Retrieval with Asia Language, pp. 1-7.
8.Chen, F., and Han, K., and Chen, G. (2002). An Approach to Sentence-Selection-Based Text Summarization, IEEE Region 10 Conference on Computers, Communications, Control and Power Engineering, (TENCON '02) Volume1, Oct. 2002 Page(s):489- 493
9.Dean, R. (1998). Personalizing your web site. Available at http://builder.cnet.com/webbuilding/pages/Business/Personal/.
10.Edmunds, A., & Morris A. (2000). The problem of information overload in business organizations:a review of the literature, International Journal of Information Management, 20, 17-28.
11.Hartigan, J.-A., and Wong, M.-A., (1979), A k-means clustering algorithm, Applied Statistics, No.28, pp.100-108.
12.Han, J., and Kamber, M., (2000). Data Mining Concepts and Techniques, Morgan Kaufmann.
13.Jain, A.-K., and Dubes R.-C., (1988), Algorithms for clustering data, Prentice Hall.
14.Kanungo, T., Mount, D.-M., Netanyahu, N.-S., Piatko, C.-D., Silverman, R., Wu, A.-Y. (2002). An Efficient k-Means Clustering Algorithm: Analysis and Implementation, IEEE Transactions on Patterns Analysis and Machine Intelligence, Vol. 24, No. 7.
15.Kolda, T.-G. and O'Leary, D.-P. (1998). A Semidiscrete Matrix Decomposition for Latent Semantic Indexing in Information Retrieval, ACM Transactions on Information Systems, 16(4),322-346.
16.Kupiec, J., Pedersen, J., and Chen, F, (1995). A trainable document summarizer, Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, Page(s):68-73.
17.Lancaster, F.-W. (1991). Indexing and Abstracting In Theory and Practice, Ann Arbor, Gushing-Malloy Inc.
18.Lawrence, R. D., Almasi, G. S., Kotlyar, V., Viveros, M. S., & Duri, S. S. (2001). Personalization of supermarket product recommendations. Data Mining and Knowledge Discovery, 5(1–2), 11–32.
19.Loeb S. and Terry D. (1992). Information Filtering. Communications of the ACM, pp.26-28.
20.Mani, I. and Maybury, M. (1999). Introduction, In I. Mani and M. Maybury (eds), Advances in Automated Text Summarization, MIT Press, pp. x-xv.
21.McDonald, D. and Chen, H.-C. (2002). Using sentence-selection heuristics to rank text segment in TXTRACTOR, Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries, Portland, Oregon, USA, 2002 Page(s): 28 – 35.
22.Mittal, B. and Lassar, W. (1996). The role of personalization in service encounters, Journal of Retailing, 14(1), 95-109.
23.Mobasher, B., Cooley, R., & Srivastava, J. (2000). Automatic personalization based on web usage mining. Communications of the ACM, 43(8), 142–151.
24.Nicola, P. (2004). Web Information Retrieval Phd Course - The term weighting problem. Available at http://sra.itc.it/people/polettini/talks.html.
25.Nunnally, J. C. (1978) Psychometric Theory, 2nd ed., New York: McGraw-Hill Inc.
26.Pennock, D.-M., Horvitz, E., Lawrence, S., and Giles, C.-L. (2000). Collaborative filtering by personality diagnosis:A hybrid memory-and model-based approach, Proceedings of the 16th of Conference on Uncertainty in Artificial Intelligence, 473-480.
27.Resnick, P., Iacovou, N., Suchak, M., Bergstrom, P., & Ried, J. (1994). GroupLens: An Open Architecture for Collaborative Filtering of Netnews. Proceedings of the ACM Conference on Computer Supported Cooperative Work , 175-186.
28.Rijsbergen, C.-J. van, Robertson, S.-E. and Porter, M.-F. (1980). New models in probabilistic information retrieval. London: British Library. (British Library Research and Development Report, no. 5587).
29.Rijsbergen, C.-J. (1975). Information Retrieval, Butterworths, Boston.
30.Rush, J.-E., Salvador, R., and Zamora, A. (1971). Automatic abstracting and indexing. II. Production of indicative abstracts by application of contextual inference and syntactic coherence criteria, Journal of the American Society for Information Science, Vol. 22, No. 3, pp. 260-274.
31.Salton, G. (1961). Studies of the bacterial cell wall. VIII. Reaction of walls with hydrazine and with fluorodintrobenzene. Biochim biophys, Acta, 52, 329.
32.Salton, G. and Gill, M. (1983). Introduction to Modern Information Retrieval, McGraw-Hill.
33.Salton, G. (1988). Automatic Text Processing, Addison-Wesley Publishing Company.
34.Salton, G., Singhal, A., Mitra, M., and Buckley, C. (1997). Automatic Text Structuring and Summarization, In Information Processing & Management, Elsevier, Vol. 33, No. 2, pp. 193-207.
35.Sarwar, B.M., Karypis, G., Konstan, J.A., & Riedl, J. (2001). Item-based collaborative filtering recommendation algorithm. Proceedings of the 10th International World Wide Web Conference, 285–295.
36.SEO Content Solutions. (2008). SEO Copywriting Firm Helps Companies Overcome Challenges of Google Search Wiki. Available at http://www.prlog.org/10146004-seo-copywriting-firm-helps-companies-overcome-challenges-of-google-search-wiki.html.
37.Shardanand, U., & Maes, P. (1995). Social information filtering: algorithms for automating "word of mouth". Proceedings of the Conference on Human Factors in Computing Systems, Denver, CO, ACM, 210-217.
38.Treese, G.-W. and Stewart, L.-C. (1998). Designing Systems for Internet Commerce, Addison Wesley Longman.
中文部分:
1.張瀚仁(1990). 個人化技術對虛擬社群發展之影響,國立政治大學資訊管理研究所碩士論文。
2.蘇諼(1996). 自動摘要法, 中國圖書館學會會報,1996,卷56,頁41-47。
3.黃純敏&吳郁瑩(1999). 網路文件自動摘要,台灣區網際網路研討會TANET99,國立中山大學承辦。
4.葉鎮源(2002). 文件自動化摘要方法之研究及其在中文文件的應用,國立交通大學資訊科學研究所碩士論文。
5.張毓倫(2003). 個人化顯隱性知識推薦方法之研究,國立成功大學資訊管理研究所碩士論文。
6.曾元顯(2004). 中文手機新聞簡訊, 第十六屆自然語言與語音處理研討會, 台北,2004年9月2-3日,頁177-189。
7.吳志宏(2004). 以隱性回饋為基礎的自動化推薦機制,朝陽科技大學資訊管理研究所碩士論文。
8.劉政璋(2005). 以概念分群為基礎之新聞文件自動摘要系統,國立交通大學資訊科學研究所碩士論文。
9.簡志偉(2005). 基於知識本體之文件處理之研究,國立成功大學資訊工程研究所碩士論文。
10.蔡旺典(2007). 建立個人化知識本體來輔助網頁行為探勘-以個人化排序為例,朝陽科技大學資訊管理研究所碩士論文。
11.趙金宏(2007). 使用者活動模式化與興趣學習之個人化查詢精鍊與結果重組研究,國立東華大學資訊工程學系碩士論文。
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top