跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.31) 您好!臺灣時間:2025/12/02 23:03
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊政遠
研究生(外文):Jeng-Yuan Yang
論文名稱:統計式中文新聞摘要
論文名稱(外文):Statistical Chinese News Summarization
指導教授:王正豪王正豪引用關係
指導教授(外文):Jenq-Haur Wang
口試委員:楊凱翔劉傳銘
口試委員(外文):Kai-Hsiang YangC.-M. Li
口試日期:2010-06-04
學位類別:碩士
校院名稱:國立臺北科技大學
系所名稱:資訊工程系研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2010
畢業學年度:98
語文別:中文
論文頁數:68
中文關鍵詞:資訊檢索自動摘要
外文關鍵詞:IRAutomated summarization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:475
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
新聞摘要最主要目的,在於將閱讀的時間縮短,為了縮短閱讀時間,新聞摘要又可以分兩種:多篇文章的摘要和單篇文章的摘要。多篇新聞摘要如同一周大事,只舉出幾則重大新聞;單篇的新聞摘要,則是比較像是心得報告,讓讀者可以不用閱讀完整篇文章,故單篇的新聞摘要著重在去蕪存菁,能刪除的就刪除,本篇論文主要著墨在單篇新聞的摘要。
本篇論文架構上,先將收集到的辭彙作為摘要系統上的dictionary,摘要系統上有需要參考詞彙時,也將該篇新聞完整的保存下來,詞彙部分則是採用bi-gram的方式保存,並紀錄document frequency以及term frequency。資料選擇科技類新聞是因為,比較容易有新的名詞出現。實驗結果顯示,確實能有效的提升分群能力,文章經過摘要的程序後,相對應的所需閱讀時間將會下降,以致於可以將閱讀多篇新聞的整體時間大幅縮短。

With the growing number of news articles around the world every day, it would be helpful to users if the time to read news articles can be reduced. Typically, there are two general ways to summarize documents: multi-document summarization and single-document summarization. Multi-document news summarization is similar to ‘hot topics of the week’, which only lists the most important news reports; while single-document news summarization is more similar to a short abstract, which help readers quickly grasp the overall idea in articles. The focus of single-document news summarization is to remove as many unimportant words as possible and only preserve major keywords. In this paper, we mainly focus on single-document summarization for Chinese news articles with statistical methods.
The proposed architecture of this paper is as follows. First, auxiliary vocabularies will be collected from news articles, which are included as the dictionary of our system. The original news articles will be kept along with the vocabularies. The vocabularies are stored in word bi-grams, as well as the document frequency and term frequency. Then, these are used to calculate the importance of sentences and select the most representative sentences as the summary. In our experiments, we only adopted news articles in the ‘science and technology’ category since more new terms can be easily obtained. The experimental result showed that news summaries generated from our system can be effectively clustered with the original news articles. These news summaries also showed a great reduction in the time needed to read news articles, which also save the total time to read all news articles. This shows that we have successfully achieved the major goal of our proposed system: to reduce the news reading time.

中文摘要 i
英文摘要 ii
誌謝 iii
目錄 iv
表目錄 vi
圖目錄 vii
第一章 緒論 1
1.1 新聞摘要簡介 1
1.2 研究背景 2
1.3 研究動機 3
1.4 研究目的 4
1.5 研究範圍 5
1.6 研究架構 5
1.7 論文編排 6
第二章 相關文獻探討 7
2.1 自動摘要 7
2.2 典型的研究 10
2.2.1 H. P. Luhn 10
2.2.2 SUMMARIST 11
2.2.3 Edmundson 12
2.3 摘要評估 13
2.4 文字斷詞 13
第三章 摘要系統 15
3.1系統架構 15
3.2系統流程 17
3.3前置處理 18
3.3.1 新聞文件蒐集 18
3.3.2 新聞詞彙處理 20
3.3.3 中央研究院中英雙語知識本體詞網 20
3.3.4 小學館 21
3.3.5 Google & Dr. eye 21
3.4中文斷句 23
3.5中文斷詞 23
3.5.1 為什麼要另外製作斷詞系統 23
3.5.2 斷詞前的準備工作 25
3.5.3 斷詞的計算方法 25
3.6計算權重 27
3.7排序及摘錄 28

第四章 實驗結果評估及分析 29
4.1實驗測試資料 29
4.2評估方法 29
4.3評估工具 32
4.3.1 文章相似工具 32
4.3.2 文章分群工具 33
4.4結果與分析 34
4.4.1 相似法 34
4.4.2 分群法 39
4.4.3 非科技類新聞 46
4.4.4 標點符號影響程度 49
第五章 結論與未來工作 51
5.1結論 51
5.2未來工作 52
參考文獻 53
附錄 56
A-1 科技類摘要的實驗結果 57
A-2 科技類摘要的實驗結果 59
A-3 科技類摘要的實驗結果 61
A-4 國際類摘要的實驗結果 63
A-5 教育類摘要的實驗結果 65
A-6 生活類摘要的實驗結果 67


[1] J. E. Rush, R. Salvador, and A. Zamora, "Automatic Abstracting and Indexing. II. Production of Indicative Abstracts by Application of Contextual Inference and Syntactic Coherence Criteria", Journal of the American Society for Information Science, Vol.22, No.3, 1971, pp.269-274.
[2] H. P. Luhn, "The Automatic Creation of Literature Abstracts", IBM Journal of Research and Development, Vol.2, No.2, 1958, pp.159-165.
[3] Automated Text Summarization, Tutorial — COLING/ACL’98.
[4] H. P. Edmundson, "New Methods in Automatic Extracting", Journal of the ACM (JACM), Vol.16, No.2, 1969, pp.264-285.
[5] P.B. Baxendale, "Machine-Made Index for Technical Literature—An Experiment", IBM Journal , 1958, pp.354–361.
[6] C-Y. Lin and E.H. Hovy, "Identifying Topics by Position", In Proceedings of the Applied Natural Language Processing Conference (ANLP), 1997, pp.283–290.
[7] J. Kupiec, J. Pedersen, and F. Chen, "A Trainable Document Summarizer", In Proceedings of the Eighteenth Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR), 1995, pp.68–73.
[8] S. Teufel and M. Moens, "Sentence Extraction as a Classification Task", In Proceedings of the Workshop on Intelligent Scalable Summarization (ACL/EACL), 1997, pp58–65.
[9] C-Y. Lin, "Machine Translation for Information Access across Language Barrier: the MuST System", In Machine Translation Summit VII, 1999, pp.13-17.
[10] C-Y. Lin and E. H. Hovy, "Automatic Evaluation of Summaries using n-gram Co-occurrence Statistics", In Proceedings of the HLT/NAACL, 2003, pp.71–78.
[11] D. Evans, J. L. Klavans, K. R. McKeown, " In Proceedings of human language technology conference/North American chapter of the association for computational linguistics annual meeting", 2004, pp.1-4.
[12] D. R. Swanson, "Historical note:Information Retrivel and the Future of an Illusion", Journal of the American Society for Information Science, Vol.39, 1998, pp.92-98.
[13] D. Marcu, "The Rhetorical Parsing of Natural Language Texts", In Proceedings of the 35th Annual Meeting of the Assoctation for Computatlonal Linguistic (ACL/EACL), 1997, pp.7-10.
[14] E. H. Hovy and C-Y. Lin, "Automating Text Summarization in SUMMARIST", In Proceedings of the Workshop on Intelligent Scalable Text Summarization, 1998, pp.18-24.
[15] R. Brandow, K. Mitze, and L. F. Rau, "Automatic Condensation of Electronic Publications by Sentence Selection", Information Processing & Management, Vol.31, No.5, 1995, pp.675-685.
[16] L.-F. Chien, "PAT-tree-based Keyword Extraction for Chinese Information Retrieval", In Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval, 1997, pp. 50–58.
[17] N.Y. Liang "Knowledge of Chinese Word Segmentation", Journal of Chinese Information Processing, Vol.4, 1990, pp. 42-49.
[18] A. Chen, J. He, L. Xu, F. C. Gey, and J. Meggs, "Chinese Text Retrieval without using a Dictionary", In Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 1997, pp.47-49.
[19] C. L. Yeh and H. J. Lee, "Rule-Based Word Identification for Mandarin Chinese Sentences — A Unification Approach", Computer Processing of Chinese and Oriental Languages, Vol.5, No.2, 1991, pp. 97-118.
[20] A. Z. Broder, "On the Resemblance and Containment of Documents", In Compression and Complexity of Sequences (SEQUENCES’ 97), 1998, pp. 21–29.
[21] 中央研究院中英雙語知識本體詞網, available at:http://bow.sinica.edu.tw/wn/ (viewed on 2010/04/01)
[22] 小學館, available at:http://proj1.sinica.edu.tw/~swjz/textbooks/intro.html (viewed on 2010/04/01)
[23] Google 字典(Beta), available at:http://www.google.com.tw/dictionary (viewed on 2010/04/01)
[24] Dr. eye 譯典通, available at:http://www.dreye.com/index_b5.html (viewed on 2010/04/01)
[25] S. Brin and L. Page, "The Anatomy of a Large-scale Hypertextual Web Search Engine", In Proceedings of the Seventh International Conference on World Wide Web, 1998, pp.107-117.
[26] K. T. Frantzi, S. Ananiadou and J. Tsujii, "The C-value/NC-value Method of Automatic Recognition for Multi-word Terms", In Proceedings of the European Conference on Digital Libraries, 1998, pp. 585-604.
[27] G. Salton and C. Buckley. "Term-Weighting Approaches in Automatic Text Retrieval", Information Processing & Management, Vol.24, No.5, 1988, pp.513–523.
[28] Similar Page Checker, available at: http://www.webconfs.com/similar-page-checker.php (viewed on 2010/04/01)
[29] PHP(similar_text), available at:http://php.net/manual/en/function.similar-text.php (viewed on 2010/04/01)
[30] Alexa, available at:http://cn.alexa.com/siteinfo/webconfs.com (viewed on 2010/04/01)
[31] Oliver (1993), available at:http://php.net/manual/en/function.similar-text.php (viewed on 2010/04/01)
[32] Clustering lib, available at: http://wikipedia-clustering.speedblue.org/clustering.php (viewed on 2010/04/01)
[33] Happycoders, available at:http://www.happycoders.org/ (viewed on 2010/04/01)
[34] P. Jaccard, "Etude comparative de la distribution florale dans une portion des Alpes et des Jura". Bulletin del la Societe Vaudoise des Sciences Naturelles, Vol.37, 1901, pp.547–579.
[35] L. R. Dice, "Measures of the Amount of Ecologic Association Between Species", Ecology, Vol. 26, 1945, pp.297-302.


QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top