跳到主要內容

臺灣博碩士論文加值系統

(44.200.117.166) 您好!臺灣時間:2023/09/24 09:17
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:劉弘裕
研究生(外文):Liu, Hung-Yu
論文名稱:考量上下文語意之誘餌新聞偵測方法
論文名稱(外文):Adopting BERT in Context-Aware Clickbait Detection
指導教授:王惠嘉王惠嘉引用關係
指導教授(外文):Wang, Hei-Chia
口試委員:高宏宇李偉柏侯建任
口試委員(外文):Kao, Hung-YuLee, Wei-PoHou, Jian-Ren
口試日期:2021-05-28
學位類別:碩士
校院名稱:國立成功大學
系所名稱:資訊管理研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:中文
論文頁數:57
中文關鍵詞:誘餌式標題類神經網路BERT雙向GRU
外文關鍵詞:Clickbait DetectionNeural NetworksBERTBi-GRU
相關次數:
  • 被引用被引用:0
  • 點閱點閱:102
  • 評分評分:
  • 下載下載:1
  • 收藏至我的研究室書目清單書目收藏:0
隨著網路新聞的發展,新聞媒體網站為了引誘讀者點擊新聞標題,賺取廣告收入,逐漸發展出誘餌式標題(Clickbait)的寫作手法。由於誘餌式標題會對讀者閱讀新聞產生負面的影響,加上人工判斷誘餌式標題的方式無法負荷海量的網路新聞。自動化的偵測誘餌式標題,誘餌式標題偵測(Clickbait Detection)成為了新興且重要的研究領域。而現今的誘餌式標題偵測方法大多針對同一個字只能表達ㄧ種語意,鮮少能夠考量上下文的不同而調整每個字的語意表達,本研究認為這會導致模型無法完整瞭解新聞的原意,從而降低模型的效果。
為了解決上述問題,本研究提出CA-CD(Context-Aware Clickbait Detection)模型進行誘餌式標題的偵測,該方法與過往研究最大的不同在於CA-CD模型結合了預訓練模型BERT使得模型能夠考量上下文的不同而調整每個字的語意表達,讓模型能夠更精準的了解新聞的原意。此外,為了透過新聞內文幫助模型更好的找出誘餌式標題,本研究將新聞標題與新聞內文一同輸入BERT進行訓練,並藉由雙向GRU循環神經網路以及注意力機制進行特徵的篩選與整合且更進一步加入詞性特徵,期望能產生對於誘餌式標題偵測最適合的語意表達,進而提升誘餌式標題偵測的準確性。此外,現今較為完整的公開資料集大多以英文為主,由於透過英文資料集訓練而成的模型大多無法用來偵測其他語言的誘餌式標題,考量到中文市場的需求性與重要性,本研究將會建置一個中文的誘餌式標題資料集,協助後續誘餌式標題在中文領域的研究。根據本研究實驗結果,與其它誘餌式標題偵測模型比較時,大多數指標皆是本研究所提出CA-CD模型達到最佳的表現,結果顯示了本研究提出的方法之有效性。
With the development of online news, news websites have gradually developed clickbait writing techniques in order to induce readers to click on news headlines links and earn advertising revenue. As clickbait has negative impacts on readers, and the method of distinguishing clickbait manually cannot load the massive amount of online news, automated detection of clickbait, Clickbait Detection, has become a new and important research field. However, most of the current methods can only represent one kind of semantic meaning for the same word, and rarely can adjust the semantic representation of each word considering the difference in context. This study considers that this will cause the model to fail to fully understand the original meaning of the news, thereby reducing the performance of the model.
To solve the above mentioned problems, this study proposes the CA-CD (Context-Aware Clickbait Detection) model to detect clickbait. The main difference between this study and previous research is that the CA-CD model combines the pre-training model BERT, which enables the model to take into consideration the differences in the contexts and subsequently adjust the semantic representation of each word; this model can therefore understand the original meaning of the news more accurately. In addition, in order to help the model to better find the clickbait through the news content, this study inputs news headlines and news content into BERT for training, and uses the bidirectional GRU and the attention mechanism to perform feature selection and integration. Furthermore, part-of-speech features are added, to produce the most suitable semantic representation for clickbait detection, thereby improving the performance of the model. Besides, most of the relatively complete public datasets nowadays are mainly in English. Since most models trained through English datasets cannot be used to detect clickbait in other languages, considering the demand and importance of the Chinese market, this study will build a Chinese clickbait dataset to assist future research on clickbait under the context of Chinese language. Lastly, compared to other Clickbait Detection models, most evaluation metrics show that the CA-CD model reaches the best performance. This result validates the effectivity of the method proposed in this study.
第1章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 6
1.3 研究限制 7
1.4 研究流程 7
1.5 論文大綱 8
第2章 文獻探討 10
2.1 誘餌式標題偵測 10
2.1.1 以特徵工程為基礎的方法 10
2.1.2 以類神經網路為基礎的方法 11
2.2 詞嵌入 13
2.2.1 Context-independent Representation 13
2.2.2 Context-aware Representation 15
2.3 類神經網路 16
2.3.1 循環神經網路(Recurrent Neural Networks, RNN) 17
2.3.2 注意力機制(Attention Mechanism) 21
2.4 小結 23
第3章 研究方法 24
3.1 研究架構 24
3.1.1 誘餌式標題資料集建置 24
3.1.2 誘餌式標題偵測方法 24
3.2 誘餌式標題資料集建置 26
3.2.1 新聞資料收集 26
3.2.2 人工標記 26
3.3 新聞前處理模組 28
3.3.1 自然語言前處理 28
3.3.2 組成輸入序列 29
3.4 上下文語意表達模組 30
3.5 語意整合與分類模組 34
3.6 小結 35
第4章 系統建置與驗證 37
4.1 系統環境建置 37
4.2 實驗方法 37
4.2.1 資料來源 37
4.2.2 實驗設計 39
4.2.3 評估指標 40
4.3 參數設置 41
4.3.1 參數一:輸入序列 41
4.3.2 參數二:類神經網路訓練參數 42
4.4 實驗結果與分析 43
4.4.1 實驗一:探討標記者對於誘餌式標題認知的一致性 43
4.4.2 實驗二:探討加入新聞內文對於誘餌式標題偵測的影響 44
4.4.3 實驗三:探討於誘餌式標題偵測模型加入詞性特徵的效果 45
4.4.4 實驗四:透過自動評估指標與其他誘餌式標題偵測模型比較 47
第5章 結論與未來方向 50
5.1 研究成果 50
5.2 未來研究方向 52
參考文獻 54
Agrawal, A. (2016). Clickbait Detection Using Deep Learning. Paper presented at the 2016 2nd International Conference on Next Generation Computing Technologies (NGCT), Dehradun, India.
Anand, A., Chakraborty, T., & Park, N. (2017). We Used Neural Networks to Detect Clickbaits: You Won’t Believe What Happened Next! Paper presented at the European Conference on Information Retrieval, ECIR 2017 - Aberdeen, United Kingdom.
Bahdanau, D., Cho, K. H., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. Paper presented at the arXiv preprint arXiv:1409.0473.
Biyani, P., Tsioutsiouliklis, K., & Blackmer, J. (2016). " 8 Amazing Secrets for Getting More Clicks": Detecting Clickbaits in News Streams Using Article Informality. Paper presented at the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, Arizona.
Chakraborty, A., Paranjape, B., Kakarla, S., & Ganguly, N. (2016). Stop Clickbait: Detecting and Preventing Clickbaits in Online News Media. Paper presented at the Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Davis, California.
Chen, Y., Conroy, N. J., & Rubin, V. L. (2015). Misleading Online Content: Recognizing Clickbait as False News. Paper presented at the Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection, Seattle, Washington, USA.
Chung, J., Gulcehre, C., Cho, K. H., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. Paper presented at the arXiv preprint arXiv:1412.3555.
Cohen, J. (1960). A Coefficient of Agreement for Nominal Scales. Educational and psychological measurement, 20(1), 37-46.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. Paper presented at the Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, Minnesota.
Dong, M., Yao, L., Wang, X., Benatallah, B., & Huang, C. (2019). Similarity-Aware Deep Attentive Model for Clickbait Detection. Paper presented at the Advances in Knowledge Discovery and Data Mining, Cham.
Fetterly, D., Manasse, M., & Najork, M. (2004). Spam, Damn Spam, and Statistics: Using Statistical Analysis to Locate Spam Web Pages. Paper presented at the Proceedings of the 7th International Workshop on the Web and Databases: colocated with ACM SIGMOD/PODS 2004, Paris, France.
Fu, J., Liang, L., Zhou, X., & Zheng, J. (2017). A Convolutional Neural Network for Clickbait Detection. Paper presented at the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural computation, 9(8), 1735-1780.
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial Neural Networks: A Tutorial. Computer, 29(3), 31-44.
Jawahar, G., Sagot, B., & Seddah, D. (2019). What Does Bert Learn About the Structure of Language? Paper presented at the ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy.
Jin, D., Jin, Z., Tianyi Zhou, J., & Szolovits, P. (2019). Is Bert Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Paper presented at the arXiv.
Kaur, S., Kumar, P., & Kumaraguru, P. (2020). Detecting Clickbaits Using Two-Phase Hybrid Cnn-Lstm Biterm Model. Expert Systems with Applications, 151, 113350.
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. Paper presented at the arXiv preprint arXiv:1408.5882.
Kumar, V., Khattar, D., Gairola, S., Kumar Lal, Y., & Varma, V. (2018). Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks. Paper presented at the The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA.
Landis, J. R., & Koch, G. G. (1977). The Measurement of Observer Agreement for Categorical Data. biometrics, 159-174.
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A Survey of Deep Neural Network Architectures and Their Applications. Neurocomputing, 234, 11-26.
Manjesh, S., Kanakagiri, T., Vaishak, P., Chettiar, V., & Shobha, G. (2017). Clickbait Pattern Detection and Classification of News Headlines Using Natural Language Processing. Paper presented at the 2017 2nd International Conference on Computational Systems and Information Technology for Sustainable Solution (CSITSS), Bengaluru, India.
Mark, G. (2014). Click Bait Is a Distracting Affront to Our Focus. nytimes. com/roomfordebate/2014/11/24/you-wont-believe-whatthese-people-say-about-click-bait/click-bait-is-a-distracting-affrontto-our-focus.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. Paper presented at the arXiv preprint arXiv:1301.3781.
Morchid, M. (2018). Parsimonious Memory Unit for Recurrent Neural Networks with Application to Natural Language Processing. Neurocomputing, 314, 48-64.
Naili, M., Chaibi, A. H., & Ghezala, H. H. B. (2017). Comparative Study of Word Embedding Methods in Topic Segmentation. Procedia computer science, 112, 340-349.
Nematzadeh, A., Meylan, S. C., & Griffiths, T. (2017). Evaluating Vector-Space Models of Word Representation, or, the Unreasonable Effectiveness of Counting Words near Other Words. Cognitive Science.
Newman, N., Fletcher, R., Kalogeropoulos, A., Levy, D., & Nielsen, R. K. (2018). Reuters Institute Digital News Report 2018.
Newman, N., Fletcher, R., Schulz, A., Andi, S., & Nielsen, R. (2020). Reuters Institute Digital News Report 2020 Reuter Institute for the Study of Journalism.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global Vectors for Word Representation. Paper presented at the Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Doha, Qatar.
Potthast, M., Gollub, T., Komlossy, K., Schuster, S., Wiegmann, M., Fernandez, E. P. G., . . . Stein, B. (2018). Crowdsourcing a Large Corpus of Clickbait on Twitter. Paper presented at the Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, New Mexico, USA.
Potthast, M., Köpsel, S., Stein, B., & Hagen, M. (2016). Clickbait Detection. Paper presented at the European Conference on Information Retrieval, Cham.
Pujahari, A., & Sisodia, D. S. (2019). Clickbait Detection Using Multiple Categorisation Techniques. Journal of Information Science, 0165551519871822.
Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to Fine-Tune Bert for Text Classification? Paper presented at the China National Conference on Chinese Computational Linguistics, Cham.
Sun, Y., Wang, S., Li, Y., Feng, S., Chen, X., Zhang, H., . . . Wu, H. (2019). Ernie: Enhanced Representation through Knowledge Integration. Paper presented at the arXiv preprint arXiv:1904.09223.
Tarwani, K. M., & Edem, S. (2017). Survey on Recurrent Neural Network in Natural Language Processing. International Journal of Engineering Trends and Technology, 48(6), 301-304.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., . . . Polosukhin, I. (2017). Attention Is All You Need. Paper presented at the Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, California, USA.
Yi, R., & Hu, W. (2019). Pre-Trained Bert-Gru Model for Relation Extraction. Paper presented at the Proceedings of the 2019 8th International Conference on Computing and Pattern Recognition, Beijing, China.
Zhang, W., Du, W., Bian, Y., Peng, C.-H., & Jiang, Q. (2020). Seeing Is Not Always Believing: An Exploratory Study of Clickbait in Wechat. Internet Research.
Zheng, H. T., Chen, J. Y., Yao, X., Sangaiah, A., Jiang, Y., & Zhao, C. Z. (2018). Clickbait Convolutional Neural Network. Symmetry, 10(5), 138.
Zheng, H. T., Yao, X., Jiang, Y., Xia, S. T., & Xiao, X. (2017). Boost Clickbait Detection Based on User Behavior Analysis. Paper presented at the Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data, Cham.
Zhou, Y. (2017). Clickbait Detection in Tweets Using Self-Attentive Network. Paper presented at the arXiv preprint arXiv:1710.05364.
Zuhroh, N. A., & Rakhmawati, N. A. (2020). Clickbait Detection: A Literature Review of the Methods Used. Register: Jurnal Ilmiah Teknologi Sistem Informasi, 6(1), 1-10.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top