研究生(外文):Chian-He Du
論文名稱(外文):A Study of Detecting Bot Users in Social Networks
外文關鍵詞:Deep LearningRestricted Boltzmann MachineMachine LearningSocial Network
近年來,由於社群網路的蓬勃發展與廣泛應用,使我們的生活與網路越來越密不可分,針對社群媒體中假帳號的相關議題和研究也越來越受到重視。過去研究方法多從原始資料中擷取大量的帳號特徵,然而藉由擷取帳號特徵做假帳號的辨識需要非常大量的已標籤資料才可以找出分類依據,這使得在蒐集資料時非常的耗時。本研究的目的在於提出更有效的分類方法以辨識假帳號的問題。本研究的貢獻在於提出以時間資料為基底的多種行為特徵,並透過統計方法和實驗比較,證明行為特徵有其分類的影響力。同時,本研究也提出了一深度學習方法,即階層式受限波茲曼機(Hierarchical-Restricted Boltzmann Machine, H-RBM),來幫助我們更有效地學習假帳號特徵。實驗結果顯示,針對較大的資料集本研究提出的分類方法其F-Measure可以比傳統分類器高約2至3個百分點。我們認為這個新方法有助於為社群媒體更準確判別真假帳號。
In recent years, due to the vigorous development and wide use of social networks, our lives to the networks are more and more closed related. Many new issues in social networks have drawn attention from the research community, in particular, detecting the bot users in social media. To deal with this issue, most research methods extract a large number of features of user accounts from raw data. However, to recognize the bot users by extracting features of user accounts usually requires a lot of tagged data before finding the basis of classification, making the data collection very time consuming. The aim of this study is to propose a more effective classification method to classify and then detect the bot users in social networks. The contribution of this study lies in presenting a variety of behavioral features based on timestamp data, and through statistical methods and experimental comparison, proving the effectiveness of the classification based on the behavioral features. At the same time, this study also proposes a deep learning method, namely Hierarchical-Restricted Boltzmann Machine (H-RBM), to help us learn more effectively. In experiments, the F-Measure of our proposed approach can be about 2 to 3 percentage higher than that of the traditional classifiers for large datasets. We believe that our proposed approach could provide an effective bot-user classification method to the social media applications.
第一章 緒論...1
1.1 研究背景及動機...1
第二章 文獻探討...3
2.1 帳號特徵與行為特徵探討...3
2.2 反向傳播網路簡介...5
2.3 受限波茲曼機簡介...7
2.4 梯度下降法及其優化演算法...9
2.5 結論...10
第三章 研究方法...12
3.1 問題描述與基本概念...12
3.2 Timestamp Data...13
3.2.1 Timestamp Data定義...13
3.2.2 Timestamp Data特徵取用...13
3.2.3 Timestamp Data 正規化...14
3.3 階層式受限波茲曼機 H-RBM...15
3.3.1 H-RBM結構...16
3.3.2 H-RBM訓練...17
第四章 實驗方法與結果分析...23
4.1 以特徵為基底的統計實驗...23
4.1.1 行為特徵與帳號特徵的相關係數...23
4.1.2 特徵分佈圖與重疊率...25
4.2 H-RBM的訓練與分類實驗...28
4.2.1 基於H-RBM的訓練模擬...28
4.2.2 基於傳統分類器與H-RBM的分類帳號實驗...31
第五章 結論與未來研究方向...37
