(3.215.183.251) 您好!臺灣時間:2021/04/22 21:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:林永昌
研究生(外文):Lin, Yung-Chang
論文名稱:樣式依存度問題於網頁探勘之影響研究
論文名稱(外文):Impact of Pattern Dependency on Web Mining Models
指導教授:吳勝堂
指導教授(外文):Wu, Sheng-Tang
口試委員:黃桂芝時文中吳勝堂
口試委員(外文):Huang, Kuei-ChihShih, Wen-ChungWu, Sheng-Tang
口試日期:2013-12-20
學位類別:碩士
校院名稱:亞洲大學
系所名稱:資訊多媒體應用學系
學門:電算機學門
學類:電算機應用學類
論文種類:學術論文
論文出版年:2013
畢業學年度:102
語文別:中文
論文頁數:66
中文關鍵詞:特徵選取網頁探勘樣式分類模型資料探勘
外文關鍵詞:Feature SelectionWeb MiningPattern Taxonomy ModelData Mining
相關次數:
  • 被引用被引用:3
  • 點閱點閱:275
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:59
  • 收藏至我的研究室書目清單書目收藏:2
隨著網際網路的快速發展,科技的日新月異和資訊技術的進步,網路已經成為現代人不可或缺的生活元素,因此,如何從大量的資料中擷取我們所需要的資訊,並且轉換成有用的知識與規則,是一個很重要的議題,也是知識發掘(Knowledge Discover)領域所研究的重點。網頁探勘(Web Mining)領域可以分為三大類,包括網頁內容探勘(Web Content Mining)、網頁結構探勘(Web Structure Mining)、以及網頁使用紀錄探勘(Web Usage Mining)。其中,網頁內容探勘的問題在於系統是否能夠正確且快速地回應使用者的需求,並避免提供使用者過多不相關的內容,因此,網頁探勘系統中描述文字語意內容的樣式(Pattern)則益顯重要。本研究進行樣式與特徵詞之間依存度問題的研究,利用樣式分類模型(Pattern Taxonomy Model)進行實驗,分析特徵選取(Feature Selection)的機制,了解對樣式組成的影響。研究結果顯示以比例方式決定特徵詞數量較能夠有客觀的表現,也較不會受到訓練樣本數量的多寡而影響,另外,以詞頻逆向文件頻率法(TFIDF)、資訊獲利(IG)以及互訊息(MI)等特徵選取方法來看,不管是以數量的方式或是比例的方式擷取特徵詞,IG是比較好的方法,因此建議使用樣式分類模型PTM時,採用IG作為特徵詞選取方式,並以70%的數量比例,做為樣式的組成依據,將會得到最好的效能表現。
With the rapid development of World Wide Web and Information Technology, the internet has become an indispensable element in people’s life. Therefore, how to retrieve information from large amounts of data and convert into useful knowledge is a very important issue. It is one of main research works in Knowledge Discovery field as well. Web Mining can be divided into three categories, including Web Content Mining, Web Structure Mining, and Web Usage Mining. Among them, the problem of Web Content Mining is how to accurately and quickly respond to users' needs and to avoid providing too much irrelevant information. As a result, Pattern which is used for describing semantic in a Web Mining system seems to be significantly important. This research work focus on the dependence between patterns and features. We analyze feature selection mechanism for understanding the impact on pattern generation by using Pattern Taxonomy Model. The experimental results show that the ratio method for determining the number of features performs well and also not affected by the varying number of training samples. In addition, among 3 feature selection methods including TFIDF, Information Gain (IG) and Mutual Information (MI), IG is the best method both in two ways of feature number determination mechanisms. It is recommended to use IG as feature selection method in PTM models and use top 70% of features for pattern composition for the best performance.
中文摘要 3
Abstract 4
誌謝 5
目錄 6
表目錄 8
第一章 前言 1
1.1 背景介紹 1
1.2 研究目的 1
1.3 研究方法 3
1.4 論文架構 4
第二章 文獻探討 5
2.1 知識發掘( Knowledge Discovery) 5
2.2 資料探勘(Data Mining) 6
2.3 網頁探勘(Web Mining) 9
2.4樣式分類模型(Pattern Taxonomy Model, PTM) 12
2.5 特徵選取 15
第三章 研究方法 18
3.1樣式分類模型(PTM) 18
3.2前置處理 24
3.3特徵選取(Feature Selection) 26
第四章 實驗與討論 30
4.1實驗設置 30
4.2效能評估指標 32
4.3實驗結果 35
4.4總結 49
第五章 結論 52
5.1討論 52
5.2結論與建議 53
參考文獻 54
簡 歷 58

[1]R. Agrawal, T. Imielinski, and A. Swami, "Mining Association Rules between Sets of Items in Large Databases," in ACM-SIGMOD, 1993, pp. 207-216.
[2]R. Agrawal and R. Srikant, "Mining Sequential Patterns," in Proceeding of the Eleventh International Confereence on Data Engineering, Taipie, 1995, pp. 3-14.
[3]Y.-L. Chen and C.-C. Yu, "Mining Sequential Patterns From Multi-dimensional Sequence Data," IEEE Transactions on Knowledge and Data Engineering, pp. 136-140, 2005.
[4]K. J. Cios, W. Pedrycz, R. W. Swiniarski, and L. A. Kurgan, "The Knowledge Discovery Process," in A Knowledgd Discovery Approch, ed, 2007, pp. 9-24.
[5]R. Cooley, B. Mobasher, and J. Srivastava:, "Web Mining: Information and Pattern Discovery on the World Wide Web," presented at the ICTAI, Minneapolis, 1997.
[6]T. M. Cover and J. A. Thomas, Elements of Information Theory. JOHN WILEY & SONS, INC: Wiley, 1991.
[7]U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, "The KDD process for extracting useful knowledge from volumes of data," presented at the Communications of the ACM, 1996.
[8]K. Gouda and M. J. Zaki, "GenMax: An Efficient Algorithm for Mining Maximal Frequent Itemsets," Data Mining and Knowledge Discovery, pp. 1-20, 2005.
[9]D. A. Grossman and O. Frieder, Information Retrieval Algorithms and Heuristics. Springer, 1998.
[10]P. H, L. F, and D. C, "Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy.," in IEEE Trans Pattern Anal Mach Intell., 2005, pp. 1226-1238.
[11]R. V. L. HARTLEY, "Transmission of Information," Lucent Technologies, pp. 535-563, 1928.
[12]W. Hersh, C. Buckley, T. J. Leone, and D. Hickarn, "OHSUMED: an interactive retrieval evaluation and new large test collection for research," in Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval, 1994, pp. 192-201.
[13]G. H. John, R. Kohavi, and K. Peger, "Irrelevant Features and the Subset Selection Problem," in Machine Learning: Proceedings of the Eleventh International Conference, 1994, pp. 121-129.
[14]N. Kwak and C.-H. Choi, "Input Feature Selection by Mutual Information Based on Parzen Window," Seoul National University, 2002, pp. 1667-1671.
[15]Y. Li, S.-J. Hu, W.-J. Yang, G.-Z. Sun, F.-W. Yao, and G. Yang, "Similarity-Based Feature Selection for Learning from Examples with Continuous Values," in Advances in Knowledge Discovery and Data Mining, ed: Springer Berlin Heidelberg, 2009, pp. 957-964.
[16]Y. Li and X. Tao, "Mining Specific and General Features in Both Positive and Negative Relevance Feedback," presented at the TREC10, 2009.
[17]Y. Li, S.-T. Wu, and X. Tao, "Effective Pattern Taxonomy Mining in Text Documents," CIKM' 08, pp. 1509-1510, 2008.
[18]J. R. QUINLAN, "Induction of Decision Trees," Machine Learning, pp. 81-106, 1986.
[19]J. Rocchio, "Relevance Feedback in Information Retrieval," ed. Prentice-Hall, 1971, pp. 313-323.
[20]T. Rose, M. Stevenson, and M. Whitehead, "The Reuters Corpus Volume 1 - from Yesterday's News to Tomorrow's Language Resources " in In Proceedings of the Third International Conference on Language Resources and Evaluation, 2002, pp. 29-31.
[21]M. Sanderson, "Word Sense Disambiguation and Information Retrieval.," in SIGIR, 1994, pp. 142-151.
[22]S.-T. Wu, Y. Li, and Y.-C. Lin, "Web Mining Using Concept-based Pattern Taxonomy Model," presented at the Informational Conference on Computing and Information Technology, Thailand, 2012.
[23]S.-T. Wu, Y. Li, and Y. Xu, "Deploying Approaches for Pattern Refinement in Text Mining," in International Conference on Data Mining, 2006, pp. 1157-1161.
[24]S.-T. Wu, Y. Li, and Y. Xu, "An effective deploying algorithm for using pattern-taxonomy," in In Proceedings of the 7th International Conference on Information Integration and Web-based Applications & Services (iiWAS05), 2005, pp. 1013-1022.
[25]S.-T. Wu, Y. Li, Y. Xu, B. Pham, and P. Chen, "Automatic Pattern-Taxonomy Extraction for Web Mining," in International Conference on Web Intelligence, 2004.
[26]X. Yan, J. Han, and R. Afshar, "Clospan: mining closed sequential patterns in large datasets.," presented at the Proceedings of SIAM Int. Conf. on Data Mining (SDM03), 2003.
[27]S. Zhang, X. Wu, J. Zhang, and C. Zhang, "A Decremental Algorithm for Maintaining Frequent Itemsets in Dynamic Databases," in Proceedings of the 7th International Conference on Data Warehousing and Knowledge Discovery (DaWaK05), 2005, pp. 305-314.
[28]N. Zhong, Y. Li, and S.-T. Wu, "Effective Pattern Discovery for Text Mining," IEEE Transactions On Knoeledge And Data Engineering, vol. 24, pp. 30-44.
[29]王勁堯, "基於Web探勘技術之自動化圖片標記與註解-以旅遊為例," 資訊工程學系, 國立暨南國際大學, 2011.
[30]吳文峰, "中文郵件分類器之設計及實作," 資訊工程學系, 逢甲大學, 2002.
[31]吳俊儀, "文件內容來源對文件分類之績效評估," 資訊管理學系, 華梵大學, 2008.
[32]吳勝堂 and 何希彥, "樣式分類模型於知識探勘中利用負向資料之研究," presented at the 第四屆資訊教育與科技應用研討會 "TREC10", 2009.
[33]巫啟台, "文件之關聯資訊萃取及其概念圖自動建構," 資訊工程學系, 國立成功大學, 2001.
[34]李亞松, "網頁瀏覽樣式探勘之研究," 資訊管理, 明星科技大學, 2005.
[35]李明德, "網路廣告輪播系統之開發," 碩士論文, 機械工程, 國立中央大學, 2000.
[36]周加恩, "網路安全偵測之分類效能提升," 資訊工程學系, 國防大學理工學院, 2012.
[37]洪淑芬, "潛在語意索引在生醫文件分類之應用," 資訊管理, 樹德科技大學, 2006.
[38]范聖祥, "結合時間粒度之混合序列樣式探勘," 碩士論文, 資訊管理, 靜宜大學, 2011.
[39]曹菀容, "自動化文件類別管理:以領域知識支援文件分類之研究," 資訊管理學系, 國立嘉義大學, 2008.
[40]連子建, "結合混合型離散化和挑選式簡易貝式特徵選取來改善簡易貝氏分類器之方法," 資訊管理學系碩士論文, 國立成功大學, 2012.
[41]陳肇勳, "序列樣式探勘的隱私保護," 碩士論文, 資訊管理學系, 靜宜大學, 2006.
[42]曾憲雄等編著, 資料探勘. 旗標出版社, 2010.
[43]黃孝文, "雲端運算服務環境下運用文字探勘於語意註解網頁文件分析之研究," 資訊管理學系, 國立政治大學, 2010.
[44]楊成成 and 賀興時, "基於改進TFIDF的文本特徵選擇算法," 西安工程大學, 2008.
[45]楊煜愷, "以完全項目集合演算法挖掘與分析使用者瀏覽行為," 資訊管理研究所, 國立暨南國際大學, 2007.
[46]趙婉舒, "從HTML網頁萃取學習資源Ontology之研究," 資訊管理學系, 銘傳大學, 2003.
[47]劉謹豪, "網路探勘在網路書局之應用," 資訊管理學系, 中華大學, 2007.
[48]鄭為倫, "單分類器在文件多類別分類上之研究," 資訊管理學系, 銘傳大學, 2005.
[49]賴昆佑, "以統計分析探討文件分類程序對期刊論文分類效果之影響," 資訊管理學系, 國立中央大學, 2007.
[50]薛肇仁, "網頁資訊擷取系統-資料轉換及排程啟動," 資訊工程, 大同大學, 2012.
[51]簡卉伶, "中文郵件過濾系統特徵選取之效度探討," 資訊科學系, 東吳大學商學院, 2008.
[52]羅元禧, "關聯規則在Web Mining的應用研究," 碩士論文, 企業管理學系, 國立臺北大學, 2003.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 施銀河(1992),「開放引進合法外籍勞工的政策規劃」,《就業與訓練》,10
2. 施銀河(1992),「開放引進合法外籍勞工的政策規劃」,《就業與訓練》,10
3. 林建山(1996),「外籍勞工引進之政策與管理」,《就業與訓練》,14(2):
4. 林建山(1996),「外籍勞工引進之政策與管理」,《就業與訓練》,14(2):
5. 成之約(1999),「淺論外籍勞工引進對就業的影響」,《政策月刊》,.52:
6. 成之約(1999),「淺論外籍勞工引進對就業的影響」,《政策月刊》,.52:
7. 成之約(1996),「國際化趨勢下的外籍勞工政策」,《勞資關係》, 15(7):
8. 成之約(1996),「國際化趨勢下的外籍勞工政策」,《勞資關係》, 15(7):
9. 郭玲惠(1997),「外籍勞工管理及引進所產生之法律問題」,《勞資關係》,
10. 郭玲惠(1997),「外籍勞工管理及引進所產生之法律問題」,《勞資關係》,
11. 黃文鐘(1997),「我國開放引進外勞之影響評估」,《勞工行政》,112:28-35。
12. 黃文鐘(1997),「我國開放引進外勞之影響評估」,《勞工行政》,112:28-35。
13. 楊錦登(1999),「生活適應之探討。國教輔導」,39(2),45-53。
14. 楊錦登(1999),「生活適應之探討。國教輔導」,39(2),45-53。
15. 楊文祝、張仁家(2001),「組織氣氛的經營與管理」,管理雜誌,323期,134-136。
 
系統版面圖檔 系統版面圖檔