研究生(外文):Shi-ting Zhong
論文名稱(外文):A Micro-blog Spammer Detection Framework Based on Mining User-Generated Context and Behavior.
指導教授(外文):Shi-Jinn Horng
中文關鍵詞:Twitter資訊檢索模型文字探勘Support Vector Machine
外文關鍵詞:TwitterInformation Retrieval ModelText MiningSupport Vector Machine
Twitter 是一個社交網站,在此網站中每篇文章最多由140個字所組成,稱之為Tweets。相較於傳統部落格,Twitter的特色是文章長度較短,但是它也允許在這簡短的文章中包含影像連結、影片連結,並且網站提供了使用者交換彼此之間資訊的功能。人們可以利用Twitter去尋找自己有興趣的主題還有文章。不幸的是,Twitter上充斥著許多垃圾訊息,這些垃圾訊息降低了Twitter搜尋引擎搜尋後的品質,也浪費了許多的網路資源,本論文的研究主要目標是偵測Twitter中的散播垃圾訊息的帳號,帶給使用者一個乾淨的網路環境。在準備產生能幫助判斷垃圾訊息散播者的分類器之前,必須要找出Twitter中能幫助分類的特徵,本論文利用文字探勘結合了資訊檢索模型產生基於文本的特徵,並且觀察使用者發文情形產生出使用者行為特徵。最後,本論文使用Support Vector Machine (SVM) 結合以上兩種特徵向量後產生出分類器,幫助在Twitter自動偵測出散播垃圾訊息的帳號。
Twitter is a social network made up of 140-character messages called Tweets. Twitter differs from a traditional blog in that its content is typically smaller. It allows users to exchange small elements of contents such as short sentences, individual images, or video links. People can use Twitter to discover the latest news related to subjects they care about. Unfortunately, Twitter has been infiltrated by large amount of Spam. Spam decreases the quality of Twitter search engine result as well as wastes network resources. Our works focus on Spammer detection of Twitter to bring user a clean webspace. In preparation for Spammer detection, we need to extract the meaningful features from Tweets. In thesis, we apply Text Mining technique with Information Retrieval Model to generate text-based feature, and we also investigate Tweets contents to generate user behavior features. Finally, we use the Support Vector Machine (SVM) to train classifier that can be used for detecting Spammer automatically in Twitter.
