跳到主要內容

臺灣博碩士論文加值系統

(44.222.104.206) 您好!臺灣時間:2024/05/23 17:09
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:王妤瑄
研究生(外文):Yu-Xuan Wang
論文名稱:資安事件摘要萃取
論文名稱(外文):Abstractive Summarization of Target Attacks Based on Transfer Learning
指導教授:陳嘉玫陳嘉玫引用關係
指導教授(外文):Chen,Chia-Mei
學位類別:碩士
校院名稱:國立中山大學
系所名稱:資訊管理學系研究所
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2021
畢業學年度:109
語文別:中文
論文頁數:64
中文關鍵詞:網路威脅情資APT事件自然語言處理自動化摘要系統類神經網路
外文關鍵詞:CTIAPT EventsNLPAutomatic Summarization SystemNeural Network
相關次數:
  • 被引用被引用:0
  • 點閱點閱:167
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
資通科技在硬體與軟體上的快速發展,提供企業組織與個人更加便利的生活。與此同時,也提升資訊安全的風險。隨著APT組織的出現,駭客組織攻擊頻率與複雜程度日益升級。針對單一組織與領域的攻擊接連出現。因此,有效利用網路威脅情資,提前了解駭客組織過往的行為,並將以往被動的防禦策略轉為主動的提前部屬,企業組織才能應對APT攻擊。
近年來,網路威脅情資蓬勃發展,已有許多全國知名的威脅情資交換平台。但所產生的大量CTI逐漸演變為大數據。若仰賴人工進行收集與分析,將花費許多時間。因此,企業組織如何快速的篩選自身所需的資訊成為一項必經課題。
有鑑於此,本研究提出一個專用於資訊安全威脅事件的自動化摘要系統「TISUM」(TISUM Threat Intelligence Summarizer)。收集大量的資訊安全事件新聞以及資訊安全報告。透過自然語言處理(Natural Language Processing,簡稱NLP)以及類神經網路,自動化產生資訊安全事件的摘要。「TISUM」達到ROUGE評分70%,讓企業組織可以快速理解網路威脅情資的重點。
The rapid development of ICT (Information Communication Technology) in hardware and software distribute more convenient life to enterprises and individuals. However, it also increases information security risk. The emergence of APT (Advanced Persistent Threat) group extends complexity and frequency of cyber-attack. More cyber-attacks target at individual organization and industry, and therefore proactive defense such as Cyber Threat Intelligence (CTI) acquisition to comprehend the behaviors of hacker groups is needed for enterprises and organizations to properly respond to APT attacks, rather than the passive and conventional defense strategies.
There are many famous threat intelligences sharing platforms in recent year, representing the flourishing development of CTI. However, it takes much time to collect and analyze the accumulated CTI information manually. Therefore, filtering out the needed information is a crucial issue for enterprises and organizations.
To solve the abovementioned issues, this study proposes an automated summarization system “TISUM” (Threat Intelligence Summarizer) to gather plenty of news and APT reports and produce summary of information security incidents automatically by utilizing Natural Language Processing (NLP) and neural networks. The proposed system can reach 70% in ROUGE evaluation, which means enterprises and organizations can comprehend the key point of cyber threat intelligences with the proposed system.
論文審定書.....................................................................................................................i
摘要................................................................................................................................ii
Abstract........................................................................................................................ iii
目錄...............................................................................................................................iv
圖次...............................................................................................................................vi
表次..............................................................................................................................vii
第一章 緒論............................................................................................................1
1.1 研究背景....................................................................................................1
1.2 研究動機....................................................................................................2
第二章 文獻探討....................................................................................................5
2.1 背景相關研究............................................................................................5
2.2 網路威脅情資............................................................................................7
2.3 機器學習與類神經網路............................................................................8
2.4 摘要技術..................................................................................................15
2.4.1 威脅行為擷取..........................................................................................17
2.4.2 實體萃取..................................................................................................17
2.4.3 關聯萃取..................................................................................................18
第三章 研究方法..................................................................................................19
3.1 資料蒐集..................................................................................................21
3.2 文本標註..................................................................................................21
3.2.1 標註工具..................................................................................................22
3.3 威脅實體萃取..........................................................................................24
3.4 威脅事件摘要萃取..................................................................................26
v
第四章 系統評估..................................................................................................28
4.1 實驗 1、標註工具與標註規則、數量比較與篩選...............................34
4.2 實驗 2、比較不同 BERT 優化器與參數設置對系統效能的影響.......37
4.3 實驗 3、比較威脅實體萃取模組中的三種不同神經網路...................40
4.4 實驗 4、威脅實體萃取相關論文比較...................................................44
4.5 實驗五、資安摘要萃取..........................................................................45
第五章 研究貢獻與未來展望..............................................................................50
參考文獻......................................................................................................................52
[1]D. Bodeau and R. Graubart, "Cyber resiliency and NIST special publication 800-53 Rev. 4 controls," MITRE, Tech. Rep., 2013.
[2]Fireeye. "Russia’s APT28 Strategically Evolves its Cyber Operations." https://www.fireeye.com/current-threats/apt-groups/rpt-apt28.html (accessed 06/12, 2021).
[3]Fireeye. "Advanced Persistent Threat Groups Who''s who of cyber threat actors." https://www.fireeye.com/current-threats/apt-groups.html (accessed 06/12, 2021).
[4]USCERT. https://us-cert.cisa.gov/ncas/alerts/aa20-301a (accessed 06/12, 2021).
[5]蔣曜宇. "中油勒索病毒事件幕後黑手來自中國,威脅「再攻10家台灣企業」,資安防護該怎麼做." https://www.bnext.com.tw/article/57748/ransomware-winntigroup-threateningtaiwan (accessed 06/13, 2021).
[6]H. Christian, M. P. Agus, and D. Suhartono, "Single document automatic text summarization using term frequency-inverse document frequency (TF-IDF)," ComTech: Computer, Mathematics and Engineering Applications, vol. 7, no. 4, pp. 285-294, 2016.
[7]J. Steinberger and M. Křišťan, "Lsa-based multi-document summarization," in Proceedings of 8th International PhD Workshop on Systems and Control, 2007, vol. 7: Citeseer.
[8]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018.
[9]D. Miller, "Leveraging BERT for extractive text summarization on lectures," arXiv preprint arXiv:1906.04165, 2019.
[10]Y. Liu, "Fine-tune BERT for extractive summarization," arXiv preprint arXiv:1903.10318, 2019.
[11]MITER. "ATT&CK Matrix." https://attack.mitre.org/matrices (accessed 07/ 16, 2021).
[12]M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E. D. Trippe, J. B. Gutierrez, and K. Kochut, "Text summarization techniques: a brief survey," arXiv preprint arXiv:1707.02268, 2017.
[13]G. Husari, E. Al-Shaer, M. Ahmed, B. Chu, and X. Niu, "Ttpdrill: Automatic and accurate extraction of threat actions from unstructured text of cti sources," in Proceedings of the 33rd Annual Computer Security Applications Conference, 2017, pp. 103-115.
[14]蕭博文. "中國駭客組織攻擊10政府單位 調查局專案偵辦." https://www.cna.com.tw/news/asoc/202008190094.aspx (accessed 03/ 21, 2021).
[15]MITER. "MITER." https://www.mitre.org/ (accessed 06/16, 2021).
[16]H. He, L. Yu, W. Cai, X. Wang, X. Gong, H. Wang, and C. Liu, "PPIDS: A Pyramid-Like Printer Intrusion Detection System Based on ATT&CK Framework," in Information Security and Cryptology: 15th International Conference, Inscrypt 2019, Nanjing, China, December 6–8, 2019, Revised Selected Papers, 2020, vol. 12020: Springer Nature, p. 277.
[17]J.Y. Kan, "應用資訊檢索提取網路威脅情資 (Extracting Cyber Threat Intelligence by Using Information Retrieval)," 2020.
[18]羅正漢. "【不只幫助攻擊入侵行為的理解,更便於企業防禦評估】資安攻防新戰略MITRE ATT&CK." https://www.ithome.com.tw/news/131274 (accessed 07/07, 2021).
[19]R. M. Lee, "2020 SANS Cyber Threat Intelligence (CTI) Survey," 2020.
[20]吳佳翰. "網路威脅情資淺談." https://www2.deloitte.com/tw/tc/pages/risk/articles/cyber-threat-intelligence.html (accessed 06/18, 2021).
[21]S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
[22]M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997.
[23]C. Parmar, R. Chaubey, K. Bhatt, and R. Lokare, "Abstractive text summarization using artificial intelligence," in 2nd International Conference on Advances in Science & Technology (ICAST), 2019.
[24]I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequence learning with neural networks," in Advances in neural information processing systems, 2014, pp. 3104-3112.
[25]N. Limsopatham and N. Collier, "Bidirectional LSTM for named entity recognition in Twitter messages," 2016.
[26]C. Dong, H. Wu, J. Zhang, and C. Zong, "Multichannel LSTM-CRF for named entity recognition in Chinese social media," in Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data: Springer, 2017, pp. 197-208.
[27]T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean, "Distributed representations of words and phrases and their compositionality," arXiv preprint arXiv:1310.4546, 2013.
[28]J. Pennington, R. Socher, and C. D. Manning, "Glove: Global vectors for word representation," in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014, pp. 1532-1543.
[29]Wiki. "Wiki." https://en.wikipedia.org/wiki/Main_Page (accessed 03/19, 2021).
[30]BooksCorpus. "BooksCorpus." https://www.english-corpora.org/googlebooks/ (accessed 03/18, 2021).
[31]J. Devlin. "Bert." https://github.com/google-research/bert (accessed 04/07, 2021).
[32]SQuAD. "SQuAD." https://rajpurkar.github.io/SQuAD-explorer/ (accessed 06/16, 2021).
[33]微軟亞洲研究院. "微軟亞洲研究院." https://www.msra.cn/ (accessed 07/28, 2021).
[34]CoNLL2003. "CoNLL2003." https://huggingface.co/datasets/conll2003 (accessed 06/16, 2021).
[35]N. Reimers and I. Gurevych, "Optimal hyperparameters for deep lstm-networks for sequence labeling tasks," arXiv preprint arXiv:1707.06799, 2017.
[36]B. Larsen, "A trainable summarizer with knowledge acquired from robust NLP techniques," Advances in automatic text summarization, vol. 71, 1999.
[37]V. Dalal and L. Malik, "A survey of extractive and abstractive text summarization techniques," in 2013 6th International Conference on Emerging Trends in Engineering and Technology, 2013: IEEE, pp. 109-110.
[38]H. P. Luhn, "The automatic creation of literature abstracts," IBM Journal of research and development, vol. 2, no. 2, pp. 159-165, 1958.
[39]H. P. Edmundson, "New methods in automatic extracting," Journal of the ACM (JACM), vol. 16, no. 2, pp. 264-285, 1969.
[40]E. Hovy and C.-Y. Lin, "Automated text summarization in SUMMARIST," Advances in automatic text summarization, vol. 14, pp. 81-94, 1999.
[41]P. M. Hanunggul and S. Suyanto, "The impact of local attention in lstm for abstractive text summarization," in 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2019: IEEE, pp. 54-57.
[42]S. Song, H. Huang, and T. Ruan, "Abstractive text summarization using LSTM-CNN based deep learning," Multimedia Tools and Applications, vol. 78, no. 1, pp. 857-875, 2019.
[43]G. Husari, X. Niu, B. Chu, and E. Al-Shaer, "Using entropy and mutual information to extract threat actions from cyber threat intelligence," in 2018 IEEE International Conference on Intelligence and Security Informatics (ISI), 2018: IEEE, pp. 1-6.
[44]Z. Zhu and T. Dumitraş, "Featuresmith: Automatically engineering features for malware detection by mining the security literature," in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, 2016, pp. 767-778.
[45]Stanford NLP Group. "Named Entity Recognition (NER) and Information Extraction (IE)." https://nlp.stanford.edu/ner/ (accessed 06/15, 2021).
[46]Spacy.io. "spacy." https://spacy.io/ (accessed 04/12, 2021).
[47]L. Rabiner and B. Juang, "An introduction to hidden Markov models," ieee assp magazine, vol. 3, no. 1, pp. 4-16, 1986.
[48]J. Lafferty, A. McCallum, and F. C. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," 2001.
[49]S. Lai, L. Xu, K. Liu, and J. Zhao, "Recurrent convolutional neural networks for text classification," in Proceedings of the AAAI Conference on Artificial Intelligence, 2015, vol. 29, no. 1.
[50]T. Linzen, E. Dupoux, and Y. Goldberg, "Assessing the ability of LSTMs to learn syntax-sensitive dependencies," Transactions of the Association for Computational Linguistics, vol. 4, pp. 521-535, 2016.
[51]H. Gasmi, A. Bouras, and J. Laval, "LSTM recurrent neural networks for cybersecurity named entity recognition," ICSEA, vol. 11, p. 2018, 2018.
[52]F. Yi, B. Jiang, L. Wang, and J. Wu, "Cybersecurity named entity recognition using multi-modal ensemble learning," IEEE Access, vol. 8, pp. 63214-63224, 2020.
[53]NVD. "NVD Data Feeds." https://nvd.nist.gov/vuln/data-feeds (accessed 06/07, 2021).
[54]R. Bunescu and R. Mooney, "A shortest path dependency kernel for relation extraction," in Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, 2005, pp. 724-731.
[55]P. Shi and J. Lin, "Simple bert models for relation extraction and semantic role labeling," arXiv preprint arXiv:1904.05255, 2019.
[56]Kbandla. "APTnotes." https://github.com/kbandla/APTnotes (accessed 03/08, 2021).
[57]Feeds Post. "Top 40 Cyber Security News Websites for Information Security Pros." https://blog.feedspot.com/cyber_security_news_websites/ (accessed 06/17, 2021).
[58]L. Richardson. "BeautifulSoup." https://www.crummy.com/software/BeautifulSoup/bs4/doc/ (accessed 03/08, 2021).
[59]J. Huggins. "Selenium." https://pypi.org/project/selenium/ (accessed 03/08, 2021).
[60]PDFminer. "PDFminer." https://pypi.org/project/pdfminer/ (accessed 03/08, 2021).
[61]Amadanmath. "Brat Rapid Annotation Tool (brat)." https://github.com/nlplab/brat (accessed 06/14, 2021).
[62]J. Yang. "YEDDA: A Lightweight Collaborative Text Span Annotation Tool." https://github.com/jiesutd/YEDDA (accessed 06/19, 2021).
[63]C.Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out, 2004, pp. 74-81.
[64]Google. "Colab." https://colab.research.google.com/?utm_source=scs-index (accessed 07/07, 2021).
[65]H. Nakayama. "Seqeval " https://github.com/chakki-works/seqeval (accessed 08/07, 2021).
[66]S. Bird. "NLTK." https://www.nltk.org/ (accessed 08/07, 2021).
[67]FAIR Facebook AIResearch. "Fasttext." https://fasttext.cc/ (accessed 03/26, 2021).
[68]Rank-bm25. "Rank-bm25." https://pypi.org/project/rank-bm25/ (accessed 08/07, 2021).
[69]Stanford NLP Group. "Named Entity Recognition (NER) and Information Extraction (IE)." https://nlp.stanford.edu/ner/ (accessed 06/15, 2021).
[70]W. McKinney. "Pandas." https://pypi.org/project/pandas/ (accessed 08/07, 2021).
[71]T. Oliphant. "Numpy." https://pypi.org/project/numpy/ (accessed 08/07, 2021).
[72]Tqdm. "Tqdm." https://pypi.org/project/tqdm/ (accessed 08/07, 2021).
[73]Scikit-learn. https://pypi.org/project/scikit-learn/ (accessed 08/07, 2021).
[74]S. Gatlan. "Chinese state hackers target Linux systems with new malware." https://www.bleepingcomputer.com/news/security/chinese-state-hackers-target-linux-systems-with-new-malware/ (accessed 08/07, 2021).
電子全文 電子全文(網際網路公開日期:20261016)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊