跳到主要內容

臺灣博碩士論文加值系統

(3.215.79.68) 您好!臺灣時間:2022/07/04 05:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳咨雅
研究生(外文):CHEN, TZU-YA
論文名稱:ICANN網站公告議題的擷取與分析之研究
論文名稱(外文):Research of the Hot Topic Extraction and Analysis Scheme for the Announcements on ICANN Website
指導教授:陳興忠張剛鳴張剛鳴引用關係
指導教授(外文):CHEN, HSING-CHUNGCHANG, KANG-MING
口試委員:呂芳懌張剛鳴陳興忠
口試委員(外文):LEU, FANG-YIECHANG, KANG-MINGCHEN, HSING-CHUNG
口試日期:2019-07-29
學位類別:碩士
校院名稱:亞洲大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:66
中文關鍵詞:網路爬蟲關聯法則探勘ICANN
外文關鍵詞:Web CrawlerAssociation Rule MiningICANN
相關次數:
  • 被引用被引用:0
  • 點閱點閱:149
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
ICANN是一個1998年在加州成立的重要非營利性公益機構。它負責全球域名系統(DNS)的管理和協調。隨著網際網路技術的快速發展,域名的總註冊量不斷增加,而這意味著網路世界的治理問題也越來越重要。
本研究使用了網路爬蟲和關聯規則的方法。首先是使用網絡爬蟲以收集ICANN官方網站上所發佈的公開資訊,再通過關聯規則挖掘來分析透過網路爬蟲所收集的資訊中是否包含其他隱藏資訊。
最後,本研究設計並構建了一個Python-based的半自動網路爬蟲系統。它可以收集目標網站上的特定數據並進行進一步的數據分析。最重要的是,根據關聯規則所產生的結果,同時創建了關聯規則詞彙表(association rule glossary, ARG)。ARG用於擴展原始字典,期望透過ARG的持續擴充以提高關聯規則挖掘過程的速度與結果的可信度。
本研究的貢獻在於使相關領域的領導者能透過關聯規則的研究結果,更方便的制定相關政策。另外,本系統建置流程亦可成為有相同需求之研究日後作參考範本。
ICANN is an important not-for-profit public-benefit corporation founded in California. It is responsible for the management and coordination of the global Domain Name System (DNS). With the rapid development of the Internet technology, the total domain name registrations increase continuously. It means that the governance issues in the online world are more and more significant.
In this paper, the approach of web crawler and association rule are used. The first is to collect the public data on the official website of ICANN with the web crawler, and the second is to analyze the collected data for hidden information with association rule mining.
Finally, a Semi-automated Python-based Web Crawler System is designed and constructed. It’s could collect the specific data on the given website and conduct the further data analysis. The most important thing is that the association rule glossary (ARG) is created based on the results of association rule. ARG is used for expanding the original dictionary. ARG is expected to increase both the speed of association rule mining process and the credibility of the results of association rule mining.
The contributions of this paper is to enable leaders in related organizations to formulate policies more conveniently based on the results of the association rules. In addition, the process of the system construction could be used as a reference model for related studies in the future.
Table of Contents
中文摘要 i
Abstract ii
感謝誌 iii
Acknowledgements iv
Table of Contents v
List of Figures vii
List of Tables ix
Chapter 1 Introduction
1.1 Background Information 1
1.2 Related Studies of Association Rule 3
1.3 Research Purpose and Value of Research 4
Chapter 2 Related Works
2.1 Organization Structure of ICANN 5
2.2 Top-level Domain (TLD) 6
2.2.1 gTLD and new gTLD 6
2.2.2 ccTLD 7
2.3 Association Rule Mining 8
2.4 Python-based Web Crawler 12
2.4.1 Web Crawler 12
2.4.2 Anaconda 13
2.4.3 Mlxtend (Machine Learning Extensions) 14
Chapter 3 The Semi-automated Python-based Web Crawler System
3.1 Research Methods 19
3.2 System Implementation 19
3.2.1 Data Collection 19
3.2.2 Data Preprocessing 23
3.2.3 Brief Statistic 29
3.2.4 Association Rule Mining 32
3.2.5 Result Outputting and Dictionary Expanding 34
Chapter 4 Experimental Results and Discussion
4.1 The Release Trend of ICANN Announcements 35
4.2 The Trend of the Hot Keywords in ICANN Announcements 36
4.3 The Trend of the Hot Topics in ICANN Announcements 42
Chapter 5 Concluding Remarks and Future Works
5.1 Concluding Remarks 44
5.2 Future Works 44
Reference 45
Appendix 48


List of Figures
Fig. 1.1.1 New gTLDs as percentage of total TLDs 1
Fig. 1.1.2 Announcement page on the official website of ICANN 2
Fig. 2.1.1 ICANN’s multi-stakeholder model 6
Fig. 2.4.1 Crawling policies of a web crawler action 13
Fig. 2.4.2 Logo of Anaconda, Inc. 13
Fig. 2.4.3 Screenshot of Anaconda navigator 14
Fig. 2.4.4 Logo of Mlxtend 14
Fig. 2.4.5 Screenshot of Mlxtend’s home page 15
Fig. 2.4.6 Flowchart of finding association rules by Mlxtend function 15
Fig. 3.1.1 Flowchart of the Semi-automated Python-based Web Crawler System 19
Fig. 3.2.1 Form of each piece of announcement data on ICANN website 20
Fig. 3.2.2 Form of the attributes in each piece of announcement data 20
Fig. 3.2.3 Source code of the announcement page on ICANN website 21
Fig. 3.2.4 Example of the program code for collecting specific data 22
Fig. 3.2.5 Announcement data saving in CSV format with five attributes 22
Fig. 3.2.6 A webpage for new comers of ICANN 24
Fig. 3.2.7 Quizlet webpage of ICANN terms in multi-language 25
Fig. 3.2.8 Screenshot of the abbreviation glossary (AG) in TD 25
Fig. 3.2.9 Screenshot of the general glossary (GG) in TD 26
Fig. 3.2.10 Example of program code for sentence tokenization 26
Fig. 3.2.11 Schematic diagram of the attributes in announcement data 27
Fig. 3.2.12 Screenshot of processed announcement data 28
Fig. 3.2.13 Screenshot of operating PivotTable in Excel 29
Fig. 3.2.14 Example of the program code for set() function 30
Fig. 3.2.15 Example of the program code for counting 30
Fig. 3.2.16 Screenshot of an example counting pool 31
Fig. 3.2.17 Example of the program code for year pairing 32
Fig. 3.2.18 Example of the program code for finding frequent itemsets 33
Fig. 3.2.19 Example of the program code for generating association rules 33
Fig. 3.2.20 Screenshot of the association rule glossary (ARG) in TD 34
Fig. 4.1.1 The quantities of ICANN announcements’ publication each year 35
Fig. 4.2.1 The quantities of the new keywords each year 39
Fig. 4.2.2 The quantities of the keywords’ reduction each year 39
Fig. 4.2.3 The scatter plot of the support value in the attribute of “Union Tag” 40


List of Tables
Table 2.2.1 Historical gTLDs 7
Table 2.2.2 Example of ccTLDs 8
Table 2.3.1 Simulated example of transaction database 9
Table 2.4.1 Simulated example of a transaction dataset 16
Table 2.4.2 TransactionEncoder module of Mlxtend 17
Table 2.4.3 Result of the formatted dataset via TransactionEncoder module 17
Table 2.4.4 Apriori function of Mlxtend 17
Table 2.4.5 Results of the frequent itemsets via Apriori function 17
Table 2.4.6 Association_rules function of Mlxtend 18
Table 2.4.7 Results of the association rule via association_rules function 18
Table 3.2.1 A piece of original announcement data which ID is 2720 27
Table 3.2.2 A piece of processed announcement data which ID is 2720 27
Table 4.2.1 The total quantities of the keywords in 4 attributes 36
Table 4.2.2 Top 10 keywords in the attribute of “Category Tag” 37
Table 4.2.3 Top 10 keywords in the attribute of “Abbreviation Tag” 37
Table 4.2.4 Top 10 keywords in the attribute of “General Tag” 37
Table 4.2.5 Top 10 keywords in the attribute of “Union Tag” 38
Table 4.2.6 A group of the attribute of “Union Tag” 40
Table 4.2.7 B group of the attribute of “Union Tag” 40
Table 4.2.8 C group of the attribute of “Union Tag” 41
Table 4.2.9 D group of the attribute of “Union Tag” 41
Table 4.3.1 Results of the association rule 43
[1]N. H. Nie and L. Erbring, “Internet and society: A preliminary report,” IT & society, vol. 1, no. 1, pp. 275-283, 2002.
[2]VERISIGN, “The Domain Name Industry Brief,” 2019, vol. 16 issue 2. Accessed: 2019 Jun. 30. [Online]. Available: https://www.verisign.com/assets/domain-name-report-Q12019.pdf
[3]“ICANN.” ICANN. https://www.icann.org/ (accessed Jun. 30, 2019).
[4]“I Am New to ICANN – Now What?” ICANN. https://www.icann.org/newcomers (accessed Jun. 30, 2019).
[5]“Cheers to the Multistakeholder Community.” ICANN. https://www.icann.org/news/blog/cheers-to-the-multistakeholder-community (accessed Jun. 30, 2019).
[6]“Final Implementation Update.” ICANN. https://www.icann.org/news/blog/final-implementation-update (accessed Jun. 30, 2019).
[7]“Stewardship of IANA Functions Transitions to Global Internet Community as Contract with U.S. Government Ends.” ICANN. https://www.icann.org/news/announcement-2016-10-01-en (accessed Jun. 30, 2019).
[8]“Statement of Assistant Secretary Strickling on IANA Functions Contract.” National Telecommunications and Information Administration. https://www.ntia.doc.gov/press-release/2016/statement-assistant-secretary-strickling-iana-functions-contract (accessed Jun. 30, 2019).
[9]“Announcements.” ICANN. https://www.icann.org/news/announcements (accessed Jun. 30, 2019).
[10]D. J. Power, “What is the “true story” about using data mining to identify a relationship between sales of beer and diapers?,” DSS News, November, vol. 10, 2002.
[11]R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large data bases, VLDB, 1994, vol. 1215, pp. 487-499.
[12]G. Piatetsky-Shapiro, “Discovery, analysis, and presentation of strong rules,” Knowledge discovery in databases, pp. 229-238, 1991.
[13]M. J. Zaki, “Scalable algorithms for association mining,” IEEE transactions on knowledge and data engineering, vol. 12, no. 3, pp. 372-390, 2000.
[14]J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,” in ACM sigmod record, 2000, vol. 29, no. 2: ACM, pp. 1-12.
[15]“Association Rule Learning.” Wikipedia. https://en.wikipedia.org/wiki/Association_rule_learning (accessed Jun. 30, 2019).
[16]R. Agrawal, T. Imieliński, and A. Swami, “Mining association rules between sets of items in large databases,” in Acm sigmod record, 1993, vol. 22, no. 2: ACM, pp. 207-216.
[17]H.-C. Chen and S.-S. Kuo, “DoS attack pattern mining based on association rule approach for web server,” in International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, 2018: Springer, pp. 527-536.
[18]“ICANN’s Multistakeholder Model.” ICANN. https://www.icann.org/community#groups (accessed Jun. 30, 2019).
[19]“What Does ICANN Do?” ICANN. https://www.icann.org/resources/pages/what-2012-02-25-en (accessed Jun. 30, 2019).
[20]“Root Zone Database.” IANA. https://www.iana.org/domains/root/db (accessed Jun. 30, 2019).
[21]“Generic top-level domain.” Wikipedia. https://en.wikipedia.org/wiki/Generic_top-level_domain (accessed Jun. 30, 2019).
[22]“Mlxtend.” Sebastian Raschka. http://rasbt.github.io/mlxtend/ (accessed Jun. 30, 2019).
[23]S. Brin, R. Motwani, J. D. Ullman, and S. Tsur, “Dynamic itemset counting and implication rules for market basket data,” Acm Sigmod Record, vol. 26, no. 2, pp. 255-264, 1997.
[24]K. Hornik, B. Grün, and M. Hahsler, “arules-A computational environment for mining association rules and frequent item sets,” Journal of Statistical Software, vol. 14, no. 15, pp. 1-25, 2005.
[25]C. Castillo, “Effective Web Crawling,” Ph. D., Computer Science, University of Chile, 2004.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top