(3.236.228.250) 您好!臺灣時間:2021/04/13 12:48
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:范芳瑄
研究生(外文):Fang-Syuan Fan
論文名稱:應用於校內法規之分類化文字探勘與檢索技術
論文名稱(外文):Classified Term Frequency-Inverse Document Frequency technique applied to school regulationsClassified Term Frequency-Inverse Document Frequency technique applied to school regulations
指導教授:蔡孟峰蔡孟峰引用關係
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系在職專班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:中文
論文頁數:73
中文關鍵詞:文字探勘文字探勘與檢索相似度分析階層式分群
外文關鍵詞:text miningTF-IDFCosine SimilarityHierarchical Clustering
相關次數:
  • 被引用被引用:0
  • 點閱點閱:66
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。

關鍵字:文字探勘、文字探勘與檢索、相似度分析、階層式分群
本研究將文字探勘與檢索技術與相性做結合並應用於『國立中央大學校內法規及延伸之校外法規』,並建立於雲端平台上來做法規分類化處理。
文字探勘與檢索技術只能呈現一種衡量量化方法,無法呈現多元化的選擇,因此透過相性並搭配餘弦相似性、階層式分群法等技術,使得一篇法規可在不同的相性產生不同的結果,透過分類可產生多元化的選擇來協助使用者找尋到適合的相關法規。
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.

keyword:text mining、TF-IDF、Cosine Similarity、Hierarchical Clustering
This study combines Term Frequency-Inverse Document Frequency technique with compatibility and applies it to the “Regulations of National Central University and Extensions of Off-campus Regulations” and establishes them on the cloud platform for tax classification.
Term Frequency-Inverse Document Frequency technique can only present one type of measurement and quantitative method and is not capable of presenting diverse selection. Therefore, through the combination of compatibility, Cosine Similarity, Hierarchical Clustering and other techniques, a regulation can produce different results in different compatibility. A wide range of selection can be produced through classification, helping users to find the proper regulations which is related.
中文摘要 i
Abstract ii
致謝 iii
目錄 iv
圖目錄 vii
表目錄 ix
第一章 緒論 1
1.1 研究動機與背景 1
1.2 研究目的 2
1.3 論文架構 3
第二章 文獻探討 5
2.1 文字探勘與檢索技術(TF-IDF) 5
2.1.1 詞頻(TF) 5
2.1.2 逆向文本頻率(IDF) 6
2.1.3 結論 8
2.2 餘弦相似性 9
2.3 群聚分析 10
第三章 系統設計 13
3.1 系統流程與架構 13
3.1.1資料建置 13
3.1.2 文字處理 14
3.1.3 法條相性歸類 15
3.1.4 文字探勘與檢索 16
3.1.5 相似度分析 17
3.1.6 階層式分群 18
3.2 研究對象 19
第四章 研究方法 21
4.1 資料蒐集 21
4.2 文字前置處理 24
4.2.1 停用詞 24
4.2.2 同義詞替換 25
4.2.3 自定詞庫斷詞 25
4.3 相性定義 25
4.4 文字探勘與檢索(TF-IDF) 26
4.4.1 詞頻(TF) 29
4.4.2 逆向文本頻率(IDF) 30
4.4.2 結果 31
4.5 計算相似度分析 32
4.6 階層式分群法(Hierarchical Clustering) 32
第五章 雲端平台分析設計流程 34
5.1 開發環境 34
5.2 自定相性 34
5.3 匯入基本資料 35
5.4 自定詞庫 36
5.5 文本切詞 37
5.6 計算TF×IDF 38
5.7 法規相似度比較 40
第六章 實證分析與結果 41
6.1 相性詞語統計 41
6.2 個相性的分布結果 42
第七章 結論 47
7.1 結論 47
7.2 遇到的困難 47
7.3 未來展望 47
參考文獻 48
附錄一 法規明細表 50
附錄二 限制條件明細表 54
附錄三 利益與權利詞語明細表 55
附錄四 法規依據詞語明細表 56
附錄五 適用對象詞語明細表 57
附錄六 審核機制詞語明細表 58
[1] P.‐N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining, Addison‐Wesley, Pearson International Edition, 2018.
[2] A. Ochiai. Zoogeographical studies on the solenoid fish found in japan and its neighboring regions. Bull, Japan Soc. Sci. Fisheries 22, 526–530, 1957.
[3] J. J. Barkman, Phytosociology and ecology of cryptogamic epiphytes, 1958.
[4] Chowdhury, G. G. Introduction to modern information retrieval, Facet publishing, 2010.
[5] G. Salton, E. A. Fox, H. Wu, Extended Boolean information retrieval. Cornell University, 1022–1036, 1982.
[6] G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval, Information processing & management, 24(5), 513-523, 1988.
[7] V. Zappala, A. Cellino, P. Farinella, Z. Knezevic, Asteroid families. I-Identification by hierarchical clustering and reliability assessment, The Astronomical Journal, 100, 2030-2046, December 1990.
[8] W. J. Frawley, G. Piatetsky-Shapiro, C. J. Matheus, Knowledge discovery in databases: An overview, AI magazine, 13(3), 57-57, 1992.
[9] M. Bramer, Principles of data mining (Vol. 180), London: Springer, 2007.
[10] I. H. Witten, E. Frank, M. A. Hall, C. J. Pal, Data Mining: Practical machine learning tools and techniques, Morgan Kaufmann, 2016.
[11] K. A. Taipale, Data mining and domestic security: Connecting the dots to make sense of data, Columbia Science and Technology Law Review, 5(2), 2003.
[12] C. Pitts, The End of Illegal Domestic Spying? Don't Count on It. Washington Spectator, 2007.
[13] F. Schwed, J. Zweig, Where are the Customers' Yachts? Or A Good Hard Look at Wall Street (p. 212). New York: Simon and Schuster, 1940.
[14] T. Menzies, Y. Hu, Data mining for very busy people. Computer, 36(11), 22-29, 2003.
[15] R. R. Bouckaert, E. Frank, M. A. Hall, G. Holmes, B. Pfahringer, P. Reutemann, I. H. Witten, WEKA−Experiences with a Java Open-Source Project. Journal of Machine Learning Research, 11(Sep), 2533-2541, 2010.
[16] J. Forcier, P. Bissex, W. J. Chun, Python web development with Django. Addison-Wesley Professional, 2008.
電子全文 電子全文(網際網路公開日期:20240731)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔