跳到主要內容

臺灣博碩士論文加值系統

(44.200.140.218) 您好!臺灣時間:2024/07/19 01:51
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:周長銘
研究生(外文):Chang-Ming Chou
論文名稱:利用文字探勘技術辨別網路謠言之真偽
論文名稱(外文):A Text Mining Approach for Automatic Rumor Detection in Social Networks
指導教授:楊錦生楊錦生引用關係
指導教授(外文):Chin-Sheng Yang
口試委員:李怡慧吳怡瑾
口試委員(外文):Joyce Y.H LeeI-Chin Wu
口試日期:2016-6-20
學位類別:碩士
校院名稱:元智大學
系所名稱:資訊管理學系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2016
畢業學年度:104
語文別:中文
論文頁數:53
中文關鍵詞:網路謠言文字探勘社會網路謠言偵測
外文關鍵詞:network rumortext miningsocial networkrumordetection
相關次數:
  • 被引用被引用:7
  • 點閱點閱:1338
  • 評分評分:
  • 下載下載:238
  • 收藏至我的研究室書目清單書目收藏:0
人際間的口語傳播,通常是較具有說服力的,因為口語傳播中,發訊者似乎並不會從中獲得任何利益,這種對於發訊者發訊動機的判斷,是一種對於訊息來源可信度知覺的表徵。而謠言的傳播,無論是透過口語傳播或透過網路傳播,傳播管道皆類似於口語傳播,因此也容易讓收訊者因為認定發訊者無欺騙動機,進而相信所傳播的訊息。本研究找出了101篇網路廣為流傳的文章,透過文字探勘的技術,並藉由文章中所描述的文字,其詞性、使用次數、情緒字眼等特徵變數,來對文章探討其真實與虛假的影響程度,希冀能得出造就網路虛假文章(網路謠言)的口語習慣,建構自動偵測網路謠言之模型。實驗結果中,本研究所探討的詞頻、詞性、情緒字眼、文章類別及N-Gram等特徵變數,藉由J48(C4.5)、SMO(Support Vector Machine)、Naive Bayes及IBk(k-nearest neighbor)等4種分類分析演算法,分析對謠言文章的真實與否影響的權重,以詞性及N-Gram兩者影響較為顯著。而另外透過特徵篩選,以資訊增益量(Information Gain)篩選出50、150個屬性後,藉由SMO分析可得到88.11%的正確度;如果在篩選出200個屬性,那麼使用Bayes分析,可得到97.02%的正確度。
In the past, rumors need word of mouth to diffuse. This limits the diffusion of rumors. However, rumors are as popular as Internet and now they could be found in all place having computers because of the rapid development of Internet.This study explored the related theory background and cases of rumor detection. After that, we collected 101 cases of Internet rumors in the nownews rumorbreaker website. Subsequently, we used some features, such asn-gram, part-of-speech, term frequency and motional words,verifiedin priorworks for rumor classifier learning. We also investigated the effect of feature selection on rumor detection. Two feature selection methods, i.e.,information gain andchi-square measure, were employed.Moreover, four supervised learning algorithms, namely J48 (C4.5), SMO (Support Vector Machine), Naive Bayes and IBk (k-nearest neighbor) were adopted. According to our empirical evaluation results,when information gain method was employed,the SMO algorithm can achieve best classification accuracy as 88.11%; on the other hand, the best classification accuracy of naïve Bayes algorithm was 97.02%
書名頁..................................................i
論文口試委員審定書......................................ii
授權書................................................iii
中文摘要...............................................vi
英文摘要..............................................vii
誌謝.................................................viii
目錄...................................................ix
表目錄.................................................xi
圖目錄...............................................xiii
第一章、 緒論 .......................................1
1.1 研究背景 .......................................1
1.2 研究動機 .......................................2
1.3 研究 .......................................3
1.4 研究架構 .......................................3
第二章、 文獻探討 .......................................5
2.1 資料探勘 .......................................5
2.1.1 分類分析 .......................................5
2.1.2 分群分析 .......................................6
2.1.3 關聯規則 .......................................6
2.2 東森新聞台網路追追追 .......................7
2.3 中研院中文斷詞服務 ...............................7
2.4 中國科學院情緒字典 ...............................9
2.5 Weka ......................................10
2.6 網路謠言與文字探勘 ..............................11
第三章、 實驗方法 ......................................13
3.1 謠言集蒐集及處理 ..............................14
3.2 特徵變數挑選 ..............................15
3.2.1 詞頻統計(Term Frequency) ......................17
3.2.2 詞性(POS) ..............................17
3.2.3 情緒字(Emotional Word) ......................18
3.2.4 文章分類(Category) ......................19
3.2.5 N-Gram ......................................20
第四章、 實驗設計 ......................................22
4.1 資料來源 ......................................22
4.2 評估方式與指標 ..............................22
4.3 資料分析 ......................................23
4.3.1 2-Gram (Binary)分析結果 ......................25
4.3.2 2-Gram (TF)分析結果 ......................28
4.3.3 1+2-Gram (Binary)分析結果 ..............32
4.3.4 1+2-Gram (TF)分析結果 ......................35
4.4 特徵篩選 ......................................38
4.4.1 資訊增益量(Information Gain) ..............38
4.4.2 卡方檢定(Chi Square) ......................41
第五章、 結論 ......................................45
5.1 結論 ......................................45
5.2 未來研究方向 ..............................45
參考文獻 ......................................47
附錄一: CKIP詞性表 ..............................51
附錄二: LIWC分類表 ..............................54
附錄三: 謠言資料集 ..............................56


一、中文文獻
1.孫秀蕙(2005),『初探網路謠言中「女性」符號運作:以東森新聞台「網路追追追」為例』,廣告學研究,第二十四集,1-29頁。
2.鄭宇庭、蘇志雄(2002),商業智慧的工具-資料採礦,輔仁管理評論(專刊),第九卷第三期,11-34頁。
3.蘇文宏(2010),利用文件探勘於中文新聞過濾,華梵大學:資訊管理學系研究所碩士論文。
4.汪志堅、駱少康(2002),以內容分析法探討網路謠言之研究,Journal of Information, Technology and Society 2002,131-149頁。
5.洪儷瑜、陳佩盈(2007),中文句型類型整理,國科會專案研究附件資料(NSC 95-2516-S-003-004-MY 3),1-9頁。
6.黃金蘭、Chung, C. K.、Hui, N.、林以正、謝亦泰、程威詮、Lam, B.、Bond. M.、Pennebaker, J. W.(2012),中文版語文探索與字詞計算字典之建立,中華心理學刊,第54卷2期,185-201頁。
7.尹其言、楊建民(2010),應用文件分群與文字探勘技術於機器學習領域趨勢分析以SSCI資料庫為例,長榮大學學報,第十四卷第二期,1-16頁。
二、英文文獻
1.Altman, N. S.,“An Introduction to Kernel and Nearest-neighbor Nonparametric Regression,” The American Statistician 46 (3), pp. 175–185, 1992.
2.Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.,“From Data Mining to Knowledge Discovery in Databases,”AI magazine, Vol 17,No 3,pp. 37-54, 1996.
3.Garg, R., Heena,“Study of Text Based Mining,”ACAI '11 Proceedings of the International Conference on Advances in Computing and Artificial Intelligence, pp. 5-8, 2001.
4.Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I. H., “The WEKA Data Mining Software: An Update,” SIGKDD Explorations11 (1), 2009.
5.Kapferer, J. N.,“Rumors- Uses, Interpretations, and Images, ”New Brunswick: Transaction Publishers, 1990.
6.Kaufman, L. and Rousseeuw,P. J., “Finding Groups is Data: an Introduction to Cluster Analysis,” John Wiley & Sons, 1990.
7.Ma, W. Y. and Chen, K. J., “Introduction to CKIP Chinese Word Segmentation System for the First International Chinese Word Segmentation Bakeoff,”Proceedings of the Second SIGHAN Workshop on Chinese Language Processing (SIGHAN '03), pp. 168-171, 2003.
8.Quinlan, J. R.,C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann, 1993.
9.Rish, I. “An Empirical Study of the Naive Bayes Classifier,” Proceedings of IJCAI-01Workshop on EmpiricalMethods in Artificial Intelligence, 2001.
10.Rumelhart, D.E., Hinton, G.E., and Williams, R.J., “Learning Internal Representations by Error Propagation,” In Parallel Distributed Processing: Explorations in the Microstructures of Cognition, Vol. 1, Rumelhart, D.E. and McClelland, J.L. (Eds.), MIT Press, Cambridge, MA, 1986, pp.318-362.
11.Sangalli, L. M., Secchi, P., Vantini, S., Vitelli, V., “k-mean alignment for curve clustering,” Computational Statistics and Data Analysis 54, pp.1219-1233, 2010.
12.Tausczik, Y.R., Pennebaker, J.W.,“The psychological meaning of words: LIWC and computerized text analysis methods,” Journal of Language and Social Psychology 29, pp. 24-54, 2010.
13.Vapnik,V.N.,The Nature of Statistical Learning Theory (2nd ed.),Germany: Springer, Berlin,2000.
14.Yang, F., Liu, Y., Yu, X., Yang, M.,“Automatic detection of rumor on sina weibo,” Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, 2012.
15.Zhang, Q., Zhang, S., Dong, J., Xiong, J., Cheng, X., “Automatic Detection of Rumor on Social Network,” Natural Language Processing and Chinese Computi, ngVol 9362, pp. 113-122, 2015

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top