跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.90) 您好!臺灣時間:2025/01/21 21:04
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:許柏方
研究生(外文):Po-fang Hsu
論文名稱:基於HITS演算法於華文社群媒體之實況運動競賽精彩片段暨語意萃取框架
論文名稱(外文):A HITS-based Semantic Highlight Detection Framework for Live Sports Games using Chinese Social Media
指導教授:陳煥陳煥引用關係
指導教授(外文):Huan Chen
口試委員:余松年范耀中
口試日期:2016-07-04
學位類別:碩士
校院名稱:國立中興大學
系所名稱:資訊科學與工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:97
中文關鍵詞:社群媒體運動賽事精華影片註解HITS 演算法
外文關鍵詞:Social MediaSports GameHighlightSemantic AnnotationHITS Algorithm
相關次數:
  • 被引用被引用:0
  • 點閱點閱:386
  • 評分評分:
  • 下載下載:10
  • 收藏至我的研究室書目清單書目收藏:0
社群網路的興起帶動一股新形態的網路的革命,使用者大量且即時的留言暨狀態更新是其特徵。因此,近年來許多關於社群網路的研究大量出現,像是地震偵測、氣候變遷追跡、運動賽事影片的精彩片段擷取等等。
這篇論文中,我們提出一個用於運動賽事精彩片段偵測暨註解萃取的新架構,在事件偵測部分,我們完全只使用社群網路中的文字留言,不同於其他現階段使用聲音/影像的方法,不僅大大地減低運算的所需資源,也節省了時間。
我們提出一個新穎的框架--基於HITS演算法於華文社群媒體之運動競賽精華片段暨語意萃取框架(HITS-SHiDF),嘗試將留言使用者及影片事件視作一個完全二分圖,並應用被廣為使用於資訊檢索的演算法HITS 演算法來做精彩片段的檢索,並且找出該精彩片段的事件註釋,相較傳統使用時間序列分析上的突波尖峰偵測方式的檢索結果,我們發現我們的檢索方法表現較佳,能不被無意義的留言干擾,如隨意性或跟隨性的發言。
而在精彩事件的語意標註方面,為了提高事件註釋檢索效果,我們自行建立詞庫,且為了更符合目標社群網站的發文特性,我們也搜集了維基百科以及該目標社群網站的歷史頁面進行處理後加入到語料庫中,以提高中文斷詞的效果,此外我們也提出一個新穎的方法來縮小精彩事件語意標註的檢索範圍,試圖加強關鍵詞組(keyphrase)的檢出,以提升在語意標註檢索上的效果。


The rise of social networking drives a new wave of revolution in the Internet word, and a large number of users instant message and fast status updating are its characteristic. Thus, in recent years, many studies on social networks have emerged, such as earthquake detection, tracing climate change, sports video highlight detection and so on.
This paper, we propose a new framework for sports game highlights detecting and annotation extraction. In the highlight detection section, we use only the social network text messages, unlike other research using the method with sound / images, not only reduce the computing resources greatly, but also become faster.
We propose a novel framework – A HITS-based Semantic Highlight Detection Framework for Live Sports Games using Chinese Social Media (HITS-SHiDF). In our research, user and highlights are seen as a complete bipartite graph, and using HITS used widely in information retrieval algorithms to do the highlights search.
In the section of highlight semantic annotation, in order to improve the performance of event annotation, we create own segmentation dictionary, using Wikipedia、historical pages of target social media as corpus. In addition, we also propose a new method, trying to enhance the effect on kekphrase extraction.

致謝詞 i
中文摘要 ii
Abstract iii
目錄 iv
附圖目錄 vii
表格目錄 ix
第一章 緒論 1
前言 1
1.2 動機與目的 2
1.3 論文架構 4
第二章 相關研究 6
2.1 現存社群媒體介紹 6
2.1.1 Twitter 6
2.1.2 PTT 7
2.2 事件偵測相關研究介紹 9
2.2.1使用影片本身進行事件偵測 9
2.2.2使用外部資源進行協同事件偵測 10
2.2.3完全不使用影片特徵的事件偵測 11
2.3 語意萃取相關研究介紹 12
2.3.1 TF-IDF 12
2.3.1.1 TF以及IDF 13
2.3.2 Graph-Based Ranking 15
2.4 斷詞詞庫之建立 17
2.4.1 Field Association(FA) Terms 17
2.4.2 Word2Vec 18
2.5 比較對象論文介紹 21
2.5.1 Moving-threshold burst detection 21
2.5.2 Sliding window 以及Moving-threshold 21
2.5.3 Moving-threshold burst detection核心概念 22
2.5.4 Moving-threshold burst detection 演算法 22
2.5.5 Moving-threshold burst detection 優缺點 23
2.5.6 Event Annotation 24
2.6 HITS (Hyper-link-induced topic search) 25
2.6.1 Hub 以及 Authority 25
2.6.2 HITS演算法 27
第三章 系統架構與方法 32
3.1 系統架構綜述 32
3.2社群媒體資料的搜集 34
3.2基於HITS演算法的精彩賽事事件偵測暨排序模型 36
3.2.1 基於HITS演算法重要事件與重要使用者偵測 37
3.2.2 Power User 以及 Power Event 37
3.2.3事件偵測暨排序演算法 39
3.3 基於TF-IDF的語意萃取暨排序模型 49
3.3.1 社群平台使用者留言內容預處理 50
3.3.2 基於TF-IDF的語意萃取機制 54
3.4 使用Word2Vec模型的段詞詞庫之建立 56
第四章 系統實作與實驗結果 58
4.1 系統開發工具及環境 58
4.2 實驗資料來源 58
4.3 實驗環境與參數設定 60
4.4 事件偵測實驗結果比較 62
4.4.1 事件偵測評估指標(一) 62
4.4.2事件偵測評估指標(二) 77
4.5 語意標註實驗結果比較 80
語意標註評估指標(一) 80
4.5.2語意標註評估指標(二) 85
第五章 結論與未來展望 88
參考文獻 89

[1]Kaplan, Andreas M., and Michael Haenlein. "Users of the world, unite! The challenges and opportunities of Social Media." Business horizons 53.1 (2010): 59-68.
[2]Starbird, K., Palen, L., Hughes, A., and Vieweg, S. "Chatter On The Red: What Hazards Threat Reveals about the Social Life of Microblogged Information. "In Proc. of CSCW 2010 .
[3] Shamma, D. A., Kennedy, L., and Churchill, E. F.Tweet the debates: Understanding Community Annotation of Uncollected Sources, In Proc. WSM 2009.
[4]Hannon, J., McCarthy, K., Lynch, J., Smyth, B.Personalized and Automatic Social Summarization of Events in Video, In Proc. IUI 2011 .
[5]Lin, C. ROUGE: A Package for Automatic Evaluation of Summaries. Prof. of the Workshop on Text Summarization Branches Out, post conference workshop of ACL 2004 .
[6]Kleinberg, Jon (1999). "Authoritative sources in a hyperlinked environment" (PDF). Journal of the ACM 46 (5): 604–632. doi:10.1145/324133.324140.
[7]The Stanford Natural Language Processing Group URL: http://nlp.stanford.edu/index.shtml
[8]Salton, Gerard, and Christopher Buckley. "Term-weighting approaches in automatic text retrieval." Information processing & management 24.5 (1988): 513-523.
[9]Sakaki, Takeshi, Makoto Okazaki, and Yutaka Matsuo. "Earthquake shakes Twitter users: real-time event detection by social sensors." Proceedings of the 19th international conference on World wide web. ACM, 2010.
[10] Xu, Changsheng, et al. "Live sports event detection based on broadcast video and web-casting text." Proceedings of the 14th annual ACM international conference on Multimedia. ACM, 2006.
[11] Lanagan, James, and Alan F. Smeaton. "Using twitter to detect and tag important events in live sports." Artificial Intelligence (2011): 542-545.
[12]Hsieh, Liang-Chi, et al. "Live semantic sport highlight detection based on analyzing tweets of twitter." Multimedia and Expo (ICME), 2012 IEEE International Conference on. IEEE, 2012.
[13]Nichols, Jeffrey, Jalal Mahmud, and Clemens Drews. "Summarizing sporting events using twitter." Proceedings of the 2012 ACM international conference on Intelligent User Interfaces. ACM, 2012.
[14]san70168, "2014 FIFA WORLD CUP GER 1-0 ARG Final "Pttworldcup , http://www.ptt.cc/bbs/WorldCup/M.1404846835.A.9EA.html 
(2014-07-9)
[15]san70168, "2014 FIFA WORLD CUP NED 0-0 ARG (SF)"Pttworldcup , http://www.ptt.cc/bbs/WorldCup/M.1404933803.A.AE0.html. 
(2014-07-10)
[16] san70168, "2014 FIFA WORLD CUP BRA 0-3 NED (3rd)"Pttworldcup, http://www.ptt.cc/bbs/WorldCup/M.1405192187.A.3D6.html 
(2014-07-13)
[17]san70168, "2014 FIFA WORLD CUP GER 1-0 ARG Final "Pttworldcup,http://www.ptt.cc/bbs/WorldCup/M.1405276044.A.502.html.(2014-07-14)
[18] AWIN sport sparking URL: https://social.awin.cs.nchu.edu.tw
[19] FIFA URL: http://www.fifa.com
[20] Germany vs. Brazil match report URL:
http://www.fifa.com/worldcup/matches/round=255955/match=300186474/report.html
[21] Netherland vs. Argentina match report URL:
http://www.fifa.com/worldcup/matches/round=255955/match=300186490/report.html
[22] Brazil vs. Netherland match report URL:
http://www.fifa.com/worldcup/matches/round=255957/match=300186502/report.html
[23] Germany vs. Argentina match report URL:
http://www.fifa.com/worldcup/matches/round=255959/match=300186501/report.html
[24] Xiaoran An et al. "Tracking Climate Change Opinions from Twitter Data ." Workshop on Data Science for Social Good held in conjunction with KDD 2014
[25] Wen Hua et al ; Short Text Understanding Through Lexical-Semantic Analysis; ICDE 2014
[26] Y. Rui, A. Gupta, and A. Acero, “Automatically extracting highlights for TV baseball programs”, In Proc. of ACM Multimedia, Los Angeles, CA, pp. 105-115, 2000.
[27] M. Xu, N.C. Maddage, C. Xu, M.S. Kakanhalli, and Q. Tian, “Creating audio keywords for event detection in soccer video”, In Proc. of IEEE International Conference on Multimedia and Expo, Baltimore, USA, Vol.2, pp.281-284, 2003.
[28] Y. Gong, L.T. Sin, C.H. Chuan, H.J. Zhang, and M. Sakauchi,
“Automatic parsing of TV soccer programs”, In Proc. Of International Conference on Multimedia Computing and Systems, pp. 167-174, 1995.
[29] A. Ekin, A. M. Tekalp, and R. Mehrotra, “Automatic soccer video analysis and summarization”, IEEE Trans. on Image Processing, vol. 12:7, no. 5, pp. 796–807, 2003.
[30] D. Zhang, and S.F. Chang, “Event detection in baseball video using superimposed caption recognition”, In Proc. of ACM Multimedia, pp. 315-318, 2002.
[31] J. Assfalg, M. Bertini, C. Colombo, A. Bimbo, and W. Nunziati, “Semantic annotation of soccer videos: automatic highlights identification,” Computer Vision and Image Understanding (CVIU), Vol. 92, pp. 285–305, November 2003.
[32] R. Radhakrishan, Z. Xiong, A. Divakaran, Y. Ishikawa, "Generation of sports highlights using a combination of supervised & unsupervised learning in audio domain", In Proc. of International Conference on Pacific Rim Conference on Multimedia, Vol. 2, pp. 935-939, December 2003.
[33] K. Wan, and C. Xu, “Robust soccer highlight generation with a novel dominant-speech feature extractor”, In Proc. of IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, pp.591-594, 27-30 Jun. 2004.
[34] M. Xu, L. Duan, C. Xu, and Q. Tian, “A fusion scheme of visual and auditory modalities for event detection in sports video”, In Proc. of IEEE International Conference on Acoustics, Speech, & Signal Processing, Hong Kong, China,Vol.3, pp.189-192, 2003.
[35] M. Han, W. Hua, W. Xu, and Y. Gong, “An integrated baseball digest system using maximum entropy method”, In Proc. of ACM Multimedia, pp.347-350, 2002.
[36] S. Nepal, U. Srinivasan, and G. Reynolds, “Automatic detection of goal segments in basketball videos, In Proc. Of ACM Multimedia, Ottawa, Canada, pp.261-269, 2001.
[37] J. Wang, C. Xu, E.S. Chng,, K. Wan, and Q. Tian, “Automatic generation of personalized music sports video”, In Proc. Of ACM International Conference on Multimedia, Singapore, pp.735-744, 6-11 Nov. 2005.
[38] N. Nitta and N. Babaguchi, “Automatic story segmentation of closed-caption text for semantic content analysis of broadcasted sports video,” In Proc. of 8th International Workshop on Multimedia Information Systems ’02, pp. 110–116, 2002.
[39] N. Babaguchi, Y. Kawai, and T. Kitahashi, “Event based indexing of broadcasted sports video by intermodal collaboration,” IEEE Trans. on Multimedia, Vol. 4, pp. 68–75, March 2002.
[40] N. Nitta, N. Babaguchi, and T. Kitahashi, “Generating semantic descriptions of broadcasted sports video based on structure of sports game,” Multimedia Tools and Applications, Vol. 25, pp. 59–83, January 2005
[41] FELLBAUM C. WordNet: an electronic lexical database[R]. Cambridge: MIT Press, 1999.
[42] LIU H, SINGH P. ConceptNet—a practical commonsense reasoning tool-kit[J]. BT technology journal, 2004, 22(4): 211-226.
[43] HowNethttp://www.keenage.com/html/c_index.html
[44] Yan Niu; Lala Li, “An Improved Chinese Segmentation Algorithm Based on New Dictionary Construction “, IEEE ICCSE’09
[45] R. Mihalcea and P. Tarau, "TextRank: Bringing order into texts," Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing, pp. 404-411, 2004.
[46] L. Page, S. Brin, R. Motwani and T. Winograd, "The PageRank citation ranking: bringing order to the web," Technical Report, Stanford InfoLab, 1999.
[47] Kaszkiel, M., & Zobel, J. (1997). Passage retrieval revised. In Proceedings of the 20th annual international ACM SIGIR conference on research and development in information retrieval (pp. 178–185).
[48] Dumais, M., Banko, E., Brill, J. L., & Ng, A. (2002). Web question answering: is more always better. In Proceedings of ACM SIGIR 2002 (pp. 291–298).
[49] L. Sangkon, M. Shishibori, T. Sumitomo, J. Aoe, Extraction of field-coherent passages Journal of Information Processing & Management, 38 (2) (2002), pp. 173–207
[50] Tsuji, T., Nigazawa, H., Okada, M., & Aoe, J. (1999). Early field recognition by using field association words. In Proceeding of the 18th international conference on computer processing of oriental language (Vol. 2, pp. 301–304).
[51] M. Fuketa, S. Lee, T. Tsuji, M. Okada, J. Aoe, A document classification by using field association words, International Journal of Information Sciences, 126 (2000), pp. 57–70
[52] Azadeh Zamanifar, Behrouz Minaei-Bidgoli, Omid Kashefi, A New Technique for Detecting Similar Documents based on Term Co-occurrence and Conceptual Property of the Text, ICDM 2008
[53] Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby; Euclidean Embedding of Co-occurrence Data; NIPS 2014
[54] Quoc Le, Tomas Mikolov; Distributed Representations of Sentences and Documents; International Conference on Machine Learning, 2014.
[55] Tomas Mikolov, Kai Chen, Greg Corrado , Jeffrey Dean; Efficient Estimation of Word Representations in Vector Space ; arXiv
[56] HINTON G E. Learning distributed representations of concepts[C]//Proceedings of the eighth annual conference of the cognitive science society. 1986: 1-12
[57] hicker; “ [先發] 中職日職對抗賽 中職vs日職 @名古屋巨蛋”;pttbaseball; https://www.ptt.cc/bbs/Baseball/M.1457169340.A.29A.html
[58] hicker; “ [先發] 中職日職對抗賽 中職vs日職 @京瓷大阪蛋”;pttbaseball; https://www.ptt.cc/bbs/Baseball/M.1457254681.A.086.html
[59] fxjy; jieba 中文分詞; https://github.com/fxsjy/jieba
[60] 維基百科; 維基百科;https://zh.wikipedia.org/zh-tw/Wikipedia:数据库下载
[61] Mahesh Viswanathan et al; Measuring speech quality for text-to-speech systems: development and assessment of amodified mean opinion score (MOS) scale; Computer Speech & Language ,Volume 19, Issue 1, January 2005, pp 55~83
[62] D.A. Huffman, "A Method for the Construction of Minimum-Redundancy Codes", Proceedings of the I.R.E., September 1952, pp 1098–1102.
[63] Bottou, Léon. "Large-scale machine learning with stochastic gradient descent." Proceedings of COMPSTAT''2010. Physica-Verlag HD, 2010. 177-186.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top