跳到主要內容

臺灣博碩士論文加值系統

(3.235.78.122) 您好!臺灣時間:2022/06/29 21:16
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:邱怡菁
研究生(外文):Chiu,Yi-Ching
論文名稱:以LDA為基之英文課程文字稿摘要法
論文名稱(外文):Summarizing English Course Transcripts based on LDA
指導教授:蕭文峰蕭文峰引用關係
指導教授(外文):Hsiao,Wen-Feng
口試委員:張德民陳耀宗
口試委員(外文):Chang,Te-MinChen,Yao-tsung
口試日期:2015-01-28
學位類別:碩士
校院名稱:國立屏東大學
系所名稱:資訊管理學系碩士班
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:中文
論文頁數:71
中文關鍵詞:文件摘要LDA主動受詞三元組文字稿
外文關鍵詞:Document SummaryLDASVO triplettranscript
相關次數:
  • 被引用被引用:4
  • 點閱點閱:766
  • 評分評分:
  • 下載下載:32
  • 收藏至我的研究室書目清單書目收藏:0
由於大規模開放式線上課程(Massive Open Online Course, MOOC)的蓬勃發展,學習者可以註冊與修習國外著名大學的課程。為了協助學習,這些課程都提供課程影音及文字稿供學習者下載學習,本論文認為文字稿的摘要可以協助學習者快速查找所需課程並掌握課程的內容。因此本論文提議使用潛在狄利克雷分配(Latent Dirichlet Allocation)對文字稿內容來進行分析,獲取其中的主題以便整理出文字稿的摘要。由於摘要是由句子組成的,而一個句子的基本組成是主詞、動詞、及受詞(或補語)等三元組(triplet),因此本論文論文提議由比對句子的三元組與LDA的主題來獲取摘要句。亦即,如果LDA的主題涵蓋了某一句子的三元組,即代表此句子愈可能是談論此一主題的重要句子應該被選為摘要句。本論文將比較所提方法與以傳統LDA進行摘要法,及pLSA進行摘要之優劣。
Owing to the burgeoning of Massive Open Online Courses (MOOC), learners can enroll and learn courses from world-wide prestigious universities. To facilitate learning, these courses also provide learners to view and download videos and corresponding transcripts. In this study we believe the transcripts can help learners filter out the courses that they need and can help grasp the key points of the courses. Therefore, we propose to use Latent Dirichlet Allocation to analyze the transcripts to extract the topics that they contain and then obtain their summary. Since a summary consists of sentences, and a sentence consists of a triplet of subject, verb, and object (complement), therefore, we proposed to compare LDA’s topic words against sentence’s triplet to obtain summary sentences. That is, the more the triplet of a sentence covers LDA’s topic words, the more likely that this sentence is related to this topic and should be selected as a summary sentence. We compared the performances of the proposed method (triplet LDA), traditional LDA, and pLSA in extracting summary sentences.
1. 緒論 1
1.1. 研究背景與動機 1
1.2. 研究目的 3
2. 文獻探討 4
2.1. 文字稿的特性 4
2.2. 文件摘要方法 5
2.3. 機率性潛藏語意分析(probabilistic Latent Semantic analysis, pLSA) 8
2.4. 潛在狄利克雷分配(Latent Dirichlet Allocation, LDA) 9
2.5. 三元組(Triplet) 15
3. 研究方法 17
3.1. 預處理 18
3.2. 處理 23
3.3. 摘要輸出與比對 25
4. 實驗與討論 27
4.1. 實驗資料集 27
4.2. LDA工具介紹 28
4.3. 字詞數及主題數的決定 31
4.4. 實驗說明 34
4.5. LDA摘要 35
5. 結論與未來研究 64
6. 參考文獻 65
7. 附錄 69
1.张明慧、王红玲、周国栋,「基于 LDA主题特征的自动文摘方法」,计算机应用与软件,Vol.28(10),pp20-22,46。
2.曾士昌,「以句子為基礎之文件摘要」,碩士論文,國立屏東商業技術學院資訊管理學系,屏東。
3.Arora, R. and Ravindran, B. (2008). "Latent Dirichlet Allocation Based Multi-Document Summarization," AND '08 Proceedings of the second workshop on Analytics for noisy unstructured text data, pp91-97.
4.Blei, D.M., Ng, A.Y., Jordan, M.I., and Lafferty, J. (2003). "Latent Dirichlet Allocation," Journal of Machine Learning, Vol. 3, pp.993-1022.
5.Chang, Y.L. and Chien, J.T. (2009). "Latent Dirichlet learning for document summarization," ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp.1689-1692.
6.Chen, Y.N., Chen, C.P., Lee, H.Y., Chan, C.A., and Lee, L.S. (2011). "Improved spoken term detection with graph-based re-ranking in feature space," ICASSP 2011.
7.Christensen, H., Gotoh, Y., Kolluru, B., and Renals, S. (2003). "Are extractive text summarisation techniques portable to broadcast news?" in Proc. IEEE Automatic Speech Recognition and Understanding Workshop.
8.Christensen, H., Kolluru, B., Gotoh, Y., and Renals, S. (2004). "From text summarisation to style-specific summarization for broadcast news," in Proc. ECIR–2004.
9.Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., and Harshman, R. (1990). "Indexing by Latent Semantic Analysis," Journal of the American Society for Information Science, Vol. 41(6), pp. 391–407.
10.de Marnffe, M, Manning, C.D, (2008). "Stanford typed dependencies manual," pp.1-12.
11.Haghighi, A. and Vanderwende, L. (2009). "Exploring content models for multi-document summarization," Proc. of NAACL-HLT.
12.Hennig, L. (2009). "Topic-based multi-document summarization with probabilistic latent semantic analysis," RANLP'09.
13.Hori, T., Hori, C., and Minami, Y. (2003b). "Speech summarization using weighted finite-state transducers," in Proc. Eurospeech.
14.Kolluru B., Christensen H., and Gotoh Y. (2005). "Multi-stage compaction approach to broadcast news summarization," Proceedings of Eurospeech, pp. 69-72.
15.Kong, S.Y. and Lee, L.S. (2006). "Improved spoken document summarization using probabilistic latent semantic analysis (plsa)," in Proc. of ICASSP, 2006.
16.Liu, N., Tang, X., Lu, Y., Li, M., Wang, H., Xiao, P. (2014) "Topic-Sensitive Multi-document Summarization Algorithm ," Parallel Architectures, Algorithms and Programming (PAAP), 2014 Sixth International Symposium on,, pp. 69-74
17.Mani, I., Klein, G., House, D., Hirschman, L., Firmin, T., and Sundheim, B. (2002). "SUMMAC: a text summarization evaluation," Natual Language Engineering, Vol. 8(1), pp. 43-68.
18.Michal, C., Karel, J. (2013). "Comparative Summarization via Latent Dirichlet Allocation," Dateso 2013, pp. 80-86.
19.Murray, K. M. (2009). "Summarization by Latent Dirichlet Allocation: Superior Sentence Extraction through Topic Modeling," A senior thesis for Bachelors degree, Princeton University.
20.Muthukkaruppan, A., Siti, F.N.M. (2014). "Content Quality of Clustered Latent Dirichlet Allocation Short Summaries," Information Retrieval Technology Lecture Notes in Computer Science, Volume 8870, pp. 494-504.
21.Misra, H., Yvon, F., Cappé, O., and Jose, J., (2011). "Text segmentation: A topic modeling perspective," Information Processing and Management Vol. 47, pp.528–544.
22.Nenkova, A., Maskey, S., Liu, Y. (2011). "Automatic summarization," HLT '11: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics.
23.Rusu, D, Dali, L, Fortuna, B, Grobelnik, M, Mladenić, D, (2007). "Triplet extraction from sentences," IS-2007, pp.8-12.
24.Vanderwende, L., Suzuki, H., Brockett, C., and Nenkova, A. (2007). "Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion," Information Processing and Management, Vol. 43.
25.Yih, W., Goodman, J., Vanderwende, L., and Suzuki, H., (2007). "Multi-Document Summarization by Maximizing Informative Content-Words," Proc. IJCAI 2007.
26.Zechner, K. (2002a). "Automatic summarization of open-domain multiparty dialogues in diverse genres," Computational Linguistics, vol. 28, no. 4, pp. 447–485.
27.Zechner, K. (2002b). "Summarization of Spoken Language - Challenges, Methods, and Prospects," Speech Technology Expert eZine, Issue 6, January 2002.
28.Zechner, K. and Waibel, A. (2000a). "DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains," Proceedings of COLING-2000.
29.Zechner, K. and Waibel, A. (2000b). "Minimizing word error rate in textual summaries of spoken language," in Proc. NAACL-2000.
30.Zhu, T. and Li, K., (2011). "The Similarity Measure Based on LDA for Automatic Summarization," Procedia Engineering Vol. 29, pp.2944-2949.
31.Xuan-Hieu Phan, Cam-Tu Nguye, “JGibbLDA, ” http://jgibblda.sourceforge.net/, 2008.
32.“Stanford Log-linear Part-Of-Speech Tagger,” http://nlp. stanford.edu/software/tagger.shtml
33.“SweSum,” http://swesum.nada.kth.se/index-eng-adv.html
34.“Tools4noobs Online summarize tool,” https://www.tools4noobs.com/summarize/
35.“Open Text Summarizer,” http://libots.sourceforge.net/

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top