跳到主要內容

臺灣博碩士論文加值系統

(44.222.82.133) 您好!臺灣時間:2024/09/08 16:52
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:馮松鍵
研究生(外文):Sung-chien Feng
論文名稱:運用可變形模板建構分類函數以進行資料挖掘
論文名稱(外文):Using Deformable Template to Construct Classification Functions for Data Mining
指導教授:陳煇煌陳煇煌引用關係
指導教授(外文):Huei-huang Chen
學位類別:碩士
校院名稱:大同大學
系所名稱:資訊經營學系(所)
學門:商業及管理學門
學類:一般商業學類
論文種類:學術論文
論文出版年:2004
畢業學年度:92
語文別:英文
論文頁數:105
中文關鍵詞:資料挖掘分類探索資料分析可變形模板
外文關鍵詞:ClassificationData MiningDeformable TemplateExploratory Data Analysis
相關次數:
  • 被引用被引用:0
  • 點閱點閱:132
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
現今的企業,每日所處理的資訊數以萬計,如何在大量的的資訊中,找出真正對使用者有價值的隱藏知識,並清楚的呈現,這便是資料挖掘所扮演的角色。
在本論文中,分類模式為主要的中心課題。鑒於不同的分類模式所適用的情形不同,所呈現的結果也不盡相同,以及何種類型的資料適合進行分析、何種資料具代表性、影響分類的因子為何?都是討論的重心。
在資料篩選方面,除了一般比較常見的前處理方式,本論文輔以探索資料分析檢視資料,保留較具代表性的資料,以利分類的進行。在分類模式方面,本論文基於正確性與建模速度的考量,選擇天真貝氏分類器,但由於基本假設的限制,遂引進可變形模板的概念,以加權的方式製作模板進行分類,提升其正確性。而實驗結果也顯示,經由此方式,大多數分類正確率確有提升,符合本論文最初的期望。
For modern enterprise, the information need to be processed is enormous in every day. How to find valuable hidden knowledge in large data sets and let people understand with a clear presentation becomes the primary mission of data mining.
In this thesis, classification model is the main issue we investigated. Since different classification models and presentation approaches are suitable for different situations, what kinds of data are suitable for each analyzing method, which data is representative data for the classification, and what are the factors that affect the precision of classification, are the issues we investigated in this thesis research.
In the scope of data selection, besides the familiar method such as data cleaning, we utilize exploratory data analysis to filter data and preserve representative data; and the next on the basis of processed data we proceed classification. In the scope of classification model, we consider two factors: precision of classification and time taken to build model. And hence we choose naïve Bayesian classifier. Because of the limitation of basic assumption, we adopt the concept of deformable template to improve the precision. This method used a weight function for each class template. The experimental results show that the precision of this method is better than the naïve Bayesian classifier in majority of the cases.
摘要 I
ABSTRACT II
ACKNOWLEDGEMENTS III
TABLE OF CONTENTS IV
LIST OF FIGURES VI
LIST OF TABLES IX
CHAPTER I INTRODUCTION 1
1.1 Background 1
1.2 Motivation 2
1.3 Objective 3
1.4 Research Considerations 4
1.5 Research Process 4
1.6 Organization of This Thesis 5
CHAPTER II LITERATURE REVIEW 6
2.1 The Definition and Architecture of Data Mining 6
2.2 Data Mining Models – Classification 13
2.2.1 Naïve Bayesian Classifier 18
2.2.2 C4.5 21
2.2.3 PART 25
2.3 Exploratory Data Analysis (EDA) 27
2.4 Deformable Template 29
2.5 Factor Analysis and Principal Component Analysis 31
2.5.1 Factor Analysis 31
2.5.2 Principal Components Analysis (PCA) 32
CHAPTER III SYSTEM ARCHITECTURE 35
3.1 Data Mining Process 35
3.1.1 Data Mining Processes 36
3.2 System Design and Architecture 38
3.2.1 The Simplified Data Mining System Architecture 38
3.2.2 The Design of modules 40
3.3 System Process 45
CHAPTER IV EXPERIMENT DESIGNS AND RESULTS 47
4.1 Dataset Description 47
4.2 Data Preprocessing and EDA 52
4.3 Training Set and Testing Set 63
4.4 Building Templates 77
4.5 Experiment Results 82
CHAPTER V CONCLUSIONS AND FUTURE WORKS 86
5.1 Conclusions 86
5.2 Future Works 86
BIBLIOGRAPHY 88
VITA 93
[1] Han, J., and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann Publishers, August 2000, pp.296-298.
[2] Kroeze, J. H., M. C. Matthee, and T. J. D. Bothma, “Differentiating Data- and Text-Mining Terminology”, Proceedings of the 2003 annual research conference of the South African institute of computer scientists and information technologists on Enablement through technology, Sept. 2003, pp.93-101.
[3] Peng, H.W., “Framework for Mining Data of Credit Card User’s Personal Attitude”, Thesis for Master of Science, Graduate School of Industrial Engineering and Management, National Taipei University of Technology, Taipei, June 2001, p.22.
[4] Chang, P. C. M., “Mining Association Rules by Sorts”, Thesis for Master of Science, Graduate School of Computer Science, National Tsing Hua University, Hsinchu, June 1997, p.31
[5] Fayyad, U. M., P. S. Gregory, P. Smyth, and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining, AAAI publishers, California, 1996, pp.134-135, p.165.
[6] Thuraisingham, B., Data Mining Technologies, Techniques, Tools and Trends, 1999, p.106.
[7] Michie, D., D. J. Spiegelhalter, and C. C.Taylor, Machine Learning, Neural and Statistical Classification, Prentice Hall Publishers, 1994, pp.12-16.
[8] Hands, D. J., Discrimination and Classification, John Wiley and Sons Publishers, New York, 1981, p.481.
[9] Weiss, S. I., and C. Kulikowski, Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Networks, Machine Learning, and Expert Systems, Morgan Kaufmamm Publishers, San Francisco, California, 1991, pp.174-175.
[10] McLachlan, G.., Discriminant Analysis and Statistical Pattern Recognition, Wiley Publishers, New York, 1992, p.48.
[11] Kumar, V., and M. Zaki, “High Performance Data Mining”. Tutorial notes of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, August 2000, pp.9-10.
[12] Pedersen, T., “A Simple Approach to Building Ensembles of Naïve Bayesian Classifier for Word Sense Disambiguation”, Proceedings of the first conference on North American chapter of the Association for Computational Linguistics, April 2000, pp.63-69.
[13] Elkan, C., “Boosting and Naïve Bayesian Learning”, Technical Report No. CS997-557, UCSD, Sept. 1997, pp.1-11.
[14] Domingos, P., and M. Pazzani, “Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifiers”, Proceeding of the Thirteen International Conference on Machine Learning, Morgan Kaufimann Publishers, Inc., 1996, pp.105-112.
[15] Berson, A., and S. J. Smith, Data Warehousing, Data Mining, and OLAP, McGraw-Hill Publishers, New York, 1996, pp.351-352.
[16] Qulian, J. R., “The Induction of Decision Trees”, Machine Learning, 1 (1), 1986, pp.65-71.
[17] Qulian, J. R., “Probabilistic Decision Trees”, In Machine Learning: An Artificial Intelligence Approach Vol. III, ed., Kodratoff, Y. and Michalski, R., Morgan Kaufmann Publishes, San Mateo, California, 1990, pp.239-243, and p350.
[18] Quinlan, J. R., “Generating Production Rules from Decision Trees”, In Proceedings of the 10th International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 1987, pp.304-307.
[19] Pagallo, G., and D. Haussler, “Boolean Feature Discovery in Empirical Learning”. Machine Learning, 5(1), 1990, pp.71-99.
[20] Michalski, R. S., “On the Quasi-minimal solution of the covering problem”, Proceedings of the 5th International Symposium on Information Processing (FCIP-69), Vol. A3 (Switching Circuits) Bled, Yugoslavia, 1969, pp. 125-128.
[21] Furnkranz, J., “Separate-and-Conquer Rule Learning”, Technical Report TR-96-25, Austrian Research Institute for Artificial Intelligence, Vienna, 1996, pp.1-46.
[22] Cohen, W. W., “Fast Effective Rule Induction”, In Proceeding of the 12th International Conference on Machine Learning, Morgan Kaufmann, 1995, pp.115-123.
[23] Frank, E., and Witten, I. H., “Generating Accurate Rule Sets Without Global Optimization”, Proc International Conference on Machine Learning, Morgan Kaufmann, 1998, pp.1-8.
[24] Mosteller, F. and Tukey, J. W., Data Analysis and Regression, Addison-Wesley, Reading, MA., 1977, pp.21-23.
[25] Hartwig, F., and B. E. Dearing, Exploratory Data Analysis, Sage Publications, 1979, p.6.
[26] Warren, R. G., “A theory of Term Weighting Based on Exploratory Data Analysis”, Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval, August 1998, pp.11-19.
[27] James, J. V., “Tracking the Right Clues with Exploratory Data Analysis”, Spectrum, IEEE , Volume: 35 , Issue: 7 , July 1998, pp.58-65.
[28] Jonathan, D., P. B. Becher, and E. Freeman, “Automating Exploratory Data Analysis for Efficient Data Mining”, Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining, August 2000, pp.424-429.
[29] Fisher, B., “2-D deformable Template Models: A Review,” 1995,
(http://www.dai.ed.au.uk/CVonline/LOCAL_COPIES/ZHONG1/zhonh.html)
[30] Mardia, K.V., “Recent Advances in Shape Statistics and Image Analysis”, Proceedings of the 2nd Asian Conf. on Computer Vision, Singapore, Dec. 1995, pp.174-178.
[31] Neuenschwander, W., P. Fua, G.. S. Szekely, and O. Kubler, “Making Snakes Converge from Minimal Initialization”, Proc. 11th Int. Conf. on Pattern Recognition (ICPR), Jerusalem, Israel, 1994, pp. A: 613-615.
[32] Storvik, G.., “A Bayesian Approach to Dynamic Contours Through Stochastic Sampling and Simulated Annealing” IEEE Trans. Pattern Anal. and Machine Intell., 16(10), October 1994, pp.976-986.
[33] Blalock, H. M., Social Statistics, McGraw-Hill Publishers, New York, 1960, pp.41-42.
[34] Morrison, D. F., Multivariate Statistical Methods, McGraw-Hill Publishers, New York, 1966, pp.99-102.
[35] Colley, W. and Lohnes, P. R., Multivariate Procedures for the Behavioral Sciences, John Wiley Publishers, New York, 1971, pp.231-239.
[36] Joreskog, K. G., “A General Method for Analysis of Covariance Structures”, Biometrika, Vol. 57, 1970, pp.239-251.
[37] Spearman, C., “General Intelligence Objectively Determined and Measured”, American Journal of Psychology, Vol. 15, 1904, pp.201-293.
[38] Jackson, D. J., Factor Analysis and Measurement in Socoilogical Research: A Multi-Dimensional Perspective, SAGE Publishers, London, 1981, pp.11-12.
[39] Gorsuch, R. L., Factor Analysis, Saunders Publidhers, Philadelphia, 1974, p.12.
[40] Tseng, Y. C., “A Study on Credit Card Fraud Detection Model”, Thesis for Master of Science, Graduate School of Information Management, Ming Chuan University, Taipei, July, 2002, pp.13-14.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
1. 劉世雄(民89)。國小教師運用資訊科技融入教學策略之探討。資訊與教育,78,60-66。
2. 蔡小鈞(民86)幼教坎坷路—公立附幼教師的話。幼教資訊,79,47。
3. 歐用生(民83)。提升教師行動研究的能力。研習資訊,11(2),1-6。
4. 廖衾儀(民92)。電腦融入幼兒教育之探討。幼教資訊,154期,頁11-18。
5. 葉榮木、張素惠(民89)。從學習的觀點探索現代教學科技與教學革新。教學科技與媒體,49,2-8。
6. 彭富源(民90)。將資訊科技融入各科教學的困境與因應。研習資訊,18(3),40-48。
7. 林素卿(民88)。行動研究與教育實習。教育實習輔導季刊,5(1),25-30。
8. 吳淑玲(民91)。唐詩中的昆蟲。幼教資訊,143,30-32。
9. 吳文中(民89)從資訊教育融入各科談教師資訊素養的困境與因應之道。資訊與教育雜誌,79,31-38。
10. 王全世(民89a)。資訊融入教學之意義與內涵。資訊與教育雜誌,80,23-31。
11. 方顥璇(民92)。幼兒用電腦好不好?幼教資訊,154,2-10。
12. 李文政、周淑惠(民88)。電腦於幼兒教育之應用。教學科技與媒體,44,47-56。
13. 陳裕隆(民89)。電腦融入教學面臨的困難與挑戰。資訊與教育雜誌,77,29-35。
14. 陳雅惠(民91)。幼兒與電腦—理論與實務之探討。幼教資訊,136,5-8。
15. 陳欣舜、徐新逸(民89)。在職教師資訊素養內含與進修方式及課程之探討。資訊與教育雜誌,80,11-22。