跳到主要內容

臺灣博碩士論文加值系統

(44.222.131.239) 您好!臺灣時間:2024/09/08 14:54
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:李文仁
研究生(外文):Wen-Jen Lee
論文名稱:建構一個PSO-SA的最佳化分類器於不平衡序列資料的分類
論文名稱(外文):Build a PSO-SA Optimization Classifier for the Imbalance Sequence Data
指導教授:蔡介元蔡介元引用關係
口試委員:任恒毅劉建浩
口試日期:2012-3-22
學位類別:碩士
校院名稱:元智大學
系所名稱:工業工程與管理學系
學門:工程學門
學類:工業工程學類
論文種類:學術論文
畢業學年度:100
語文別:英文
論文頁數:70
中文關鍵詞:序列樣式探勘類別不平衡問題序列分類方法粒子群最佳化
外文關鍵詞:Sequential Pattern MiningClass Imbalance ProblemSequence Classification ProblemParticle Swarm Optimization
相關次數:
  • 被引用被引用:0
  • 點閱點閱:184
  • 評分評分:
  • 下載下載:4
  • 收藏至我的研究室書目清單書目收藏:0
資料探勘中非常有名的技術「序列模式探勘」與「序列分類」的技術可以解決許多問題,例如顧客行為變化。然而,傳統的方法對於正確分類出少數樣本的能力不足。實際上,類別資料不平衡的問題經常會發生在日常生活中,例如詐騙行為的偵測、醫療診斷、垃圾信件偵測、產品監控與檢測等,因此本研究將針對不平衡序列資料發展出一個有效的分類方法。在本研究中,根據每個序列所屬的類別,AprioriAll演算法將被用來找出該類別的序列樣式,接著使用pairwise coupling方法將原始的多重類別序列資料拆成組合成許多組二類別的資料,針對每一組二類別的資料,我們會使用所提出的FMCIS方法來建構分類器並命名為FMCIS。每一個分類器會先對一條序列產生兩個相似度的值,接著再運用這兩個值去建構模糊偏好關係所需要的單元。然後使用模糊偏好關係將各個分類器所產生出來的單元值進行整合並加以計算,根據本研究所設定的終止條件將可以產生最後的分類結果。為了增加分類的準確性,一種混和的PSO-SA演算法將會被提出來調整FMCIS裡面序列樣式的權重以及模糊偏好關係裡面類別的權重。結果顯示出本研究提出的分類模型可以有效的解決序列資料不平行的分類問題,但是在模糊偏好關係這個部分,類別的權重並沒有辦法能夠有效地提升整體的分類準確率。
Sequential pattern mining and sequence classification are two popular data mining methods used to explore the change of customer behavior. However, traditional methods have poor predictive ability to identify minority instances when dealing with the class imbalance problem. Actually, the imbalance class problem such as fraud detection, medical diagnosis, spam detection and fault monitoring/inspection exists everywhere in the real world. Therefore, this study develops an effective method to cope with the imbalance sequence classification problem. In this study, the sequences are divided into several sequence subsets according to the class label of sequences. Then, the AprioriAll algorithm is applied for each sequence subset and finds its sequential patterns. Next, the pairwise coupling method is used to combine every pair of sequence subsets and form a set of binary class datasets. For every binary class dataset, Force multi-class imbalance sequence (FMCIS) method is developed to build a classifier. Each classifier will generate two similarity values for a sequence first, then construct the units in fuzzy preference relations due to these two similarity values. The units will be composed by the fuzzy preference relations and computed a set of non-dominated values. Finally, the final class label of a sequence will be predicted due to the maximal non-dominated value. To increase the prediction accuracy of the proposed classifier, a hybrid PSO-SA algorithm is developed to adjust the weights of each pattern in each classifier and the weights of each class in fuzzy preference relations. The results show that the proposed classification model is useful for the sequence classification with imbalance data and especially in the low support value. But the applying optimized weighting in fuzzy preference relations does not perform well as expected.
ABSTRACT i
摘要 iii
Table of Contents iv
List of Figures vi
List of Tables vii
Chapter 1 Introduction 1
1.1 Research Background 1
1.2 Research Problem 2
1.3 Research Objectives 3
1.4 Thesis Framework 4
Chapter 2 Literature Review 5
2.1 Sequential Pattern mining 5
2.2 Class imbalance problems 7
2.3 Sequence Similarity Measure 10
2.4 Sequence Classification Method 11
2.5 Particle Swarm Optimization 13
Chapter 3 Research Method 16
3.1 Research Framework 16
3.2 Sequential Pattern Mining 18
3.2.1 Sequential Pattern Mining 19
3.2.2 AprioriAll algorithm 19
3.3 The Pairwise Coupling Approach 22
3.4 Force Multi-Class Imbalance Sequence (FMCIS) Classifier 23
3.4.1 Similarity Measure between Sequences and Patterns 24
3.4.2 The process of building FMCIS classifier 28
3.5 Fuzzy Preference Relations 29
3.6 Classification Accuracy 31
3.7 Particle Swarm Optimization – Simulated Annealing algorithm 31
3.7.1 The objective functions 32
3.7.2 The proposed PSO-SA hybrid algorithm 32
Chapter 4 Experiment Result for a Case 38
4.1 Introduction to Datasets 38
4.2 Generation of FMCIS Classifier and Result 39
4.2.1 Mining Sequential Patterns 39
4.2.2 Similarity Measure between Pattern and Sequence 41
4.2.3 The result of the proposed Classification Model 42
4.3 The difference between PSO algorithm and PSO-SA algorithm 47
4.4 The parameter selection in PSO-SA algorithm 48
4.5 The effect of the support values 51
4.6 The effect of adopting optimization in fuzzy preference relation 57
Chapter 5 Conclusion 61
5.1 Conclusion 61
5.2 Suggestions and Future Works 62
References 64
References

1.Agrawal, R. and Srkant, R., “Fast algorithm for mining association rules,” Proceedings of the 20th VLDB Conference, pp. 487-499, 1994.
2.Agrawal, R. and Srkant, R., “Mining sequential pattern,” Proceedings of the 11th International Conference on Data Mining, pp. 3-14, 1995.
3.Altincay, H. and Ergun, C., “Clustering based under-sampling for improving speaker verification decisions using AdaBoost,” Lecture Notes in Computer Science, 3138, pp. 698-706, 2004.
4.Arbell, O., Landau, G. M., Mitchell, J. S. B., “Edit distance of run-length encoded strings,” Information Processing Letters, 83, pp. 307-314, 2002.
5.Balopoulos, V., Hatzimichailidis, A. G., Papadopoulos, B. K., “Distance and similarity measures for fuzzy operators,” Information Science, 177, pp. 2336-2348, 2007.
6.Bargiela, A. and Pedrycz, W., “Recursive information granulation: aggregation and interpretation issues,” IEEE Transactions on Systems, Man, and Cybernetics, 33 (1), pp. 96-112, 2003b.
7.Batista, G., Prati, R.C., Monard, M.C., “A study of the behavior of several methods for balancing machine learning training data,” SIGKDD Explorations, 6 (1), pp. 20-29, 2004.
8.Castellano, G. and A. M. Fanelli, “Information granulation via neural network-based learning,” IFSA World Congress and 20th NAFIPS International Conference, 5, pp. 3059-3064, 2001.
9.Chawla, N. V., Bowyer, K., Hall, L., Kegelmeyer, W., “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, 16, pp. 321-357, 2002.
10.Chawla, N. V., Japkowicz, N., Kolcz, A., “Editorial: special issue on learning from imbalanced data sets,” SIGKDD Explorations, 6 (1), pp. 1-6, 2004.
11.Chen, P. C., “The research and implementation for the sequential patterns mining system,” Department of Information Engineering, Nation Taiwan University, 2002.
12.Chen, M. C., Chen, L. S., Hsu, C. C., Zeng, W. R., “An information granulation based data mining approach for classifying imbalanced data,” Information Sciences, 178, pp. 3214–3227, 2008.
13.Cristianini, N. and Shawe-Taylor, J., “An introduction to support vector machines and other kernel-based learning methods,” Cambridge University Press, Cambridge, 2002.
14.Eberhart, R. C., and Shi, Y., “Comparing Inertia Weights and Constriction Factors in Particle Swarm Optimization,” Proceedings of the 2000 Congress on Evolutionary Computation, 1, pp. 84-88, 2000.
15.Exarchos , T. P., Tsipouras, M. G., Papaloukas, C., Fotiadis, D. I., “A two-stage methodology for sequence classification based on sequential pattern mining and optimization,” Data Mining and Engineering, 66, pp. 467-487, 2008.
16.Fernandez, A., Calderon, M., Barrenechea, E., Bustince, H., Herrera, F., “Solving multi-class problems with linguistic fuzzy rule based classification systems based on pairwise learning and preference relations,” Fuzzy Sets and Systems, 161, pp. 3064–3080, 2010.
17.Fourie, P. C., and Groenwold A. A., “The Particle Swarm Optimization Algorithm in Size and Shape Optimization,” Structural and Multidisciplinary Optimization, 23 (4), pp. 259-267, 2002.
18.Grzymala-Busse, J. W., Stefanowski, J., Wilk, S., “A comparison of two approaches to data mining from imbalanced data,” Lecture Notes in Computer Science, 3213, pp. 757-763, 2004.
19.Guo, H. and Viktor, H. L., “Learning from imbalanced data sets with boosting and data generation: the Data Boost-IM approach,” SIGKDD Explorations, 6 (1), pp. 30-39, 2004.
20.Hastie, T., and Tibshirani, R., “Classification by pairwise coupling,” The Annals of Statistics, 26 (2), pp. 451-471, 1998.
21.Han, J., Mortazvi-Asl, B., Chen, Q., Dayal, U., Hsu, M., “FreeSpan: frequent pattern-projected sequential pattern mining,” Proceedings of the International Conference on Knowledge Discovery and Data Mining, pp. 355-359, 2000.
22.Han, J., Pei, J., Pinto, H., Chen, Q., Dayal, U., Hsu, M., “PrefixSpan: mining sequential pattern efficiently by prefix-projected pattern growth,” Proceedings of the 17th International Conference on Data Mining, pp. 215-224, 2001.
23.Huang, K., Yang, H., King, I., Lyu, M., “Learning classifiers from imbalanced data based on biased minimax probability machine,” Proceedings of the 04’ IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), pp. 558-563, 2004.
24.Japkowicz, N. and Stephen, S., “The class imbalance problem: a systematic study,” Intelligent Data Analysis, IOS Press, 6, pp. 429-449, 2002.
25.Jo, T. and Japkowicz, N., “Class imbalances versus small disjuncts,” SIGKDD Explorations, 6 (1), pp. 40-49, 2004.
26.Joung, J. G. O. S. J. and Zhang, B. T., “Protein sequence-based risk classification for human papillomaviruses,” Computers in Biology and Medicine, 36, pp. 353-667, 2006.
27.AL-Kazemi, B., and Mohan, C. K., “Multi-Phase Generalization of the Particle Swarm Optimization Algorithm,” Proceedings of the IEEE Congress on Evolutionary Computation, 1, pp. 489-494, 2002.
28.Kennedy, J. and Eberhart, R. C., “Particle swarm optimization,” IEEE International Conference on Neural Networks, 4, pp. 1942-1948, 1995.
29.Kennedy, J., Eberhart, R. C., Shi, Y., “Swarm Intelligence,” San Francisco: Morgan Kaufmann, 2001.
30.Kim, M., and Pavlovic, V., “Sequence classification via large margin hidden Markov models,” Data mining and Knowledge Discovery, 23, pp. 322-344, 2010.
31.Krink, T., Vesterstorm J. S., Riget, J., “Particle Swarm Optimization with Spatial Particle Extension,” Proceedings of the IEEE Congress on Evolutionary Computation, pp. 1474-1497, 2002.
32.Kubat, M., Holte, R., Matwin, S., “Learning when Negative Examples Abound,” Proceedings of the 9th European Conference on Machine Learning, Prague, Czech Republic, pp. 146-153, 1997.
33.Kubat, M., Holte, R., Matwin, S., “Machine Learning for The Detection of Oil Spills in Satellite Radar Images,” Machine Learning, 30, pp. 195-215, 1998.
34.Lee, C., Tsai, C., Chen, C., “A Multivariate Decision Tree for Imbalanced Datasets,” Master Thesis, Taiwan, ROC, 2002.
35.Lesh, N., Zaki, M. J., Ogihara, M., “Mining features for sequence classification,” Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 342-346, 1999.
36.Lesh, N., Zaki, M. J., Ogihara, M., “Scalable feature mining for sequential data,” IEEE Intelligent Systems and their Applications, 15, pp. 48-56, 2000.
37.Li, D. C. and Liu, C. W., “A learning method for the class imbalance problem with medical data sets,” Computers in Biology and Medicine, 40, pp. 509–518, 2010.
38.Liu, N., and Wang, T. M., “A relative similarity measure for the similarity analysis of DNA sequences,” Chemical Physics Letters, 408, pp. 307-311, 2005.
39.Loewenstern, D. M., Berman, H. M., Hirsh, H., “Maximum a posteriori classification of DNA structure from sequence information,” Proceedings of the Pacific System on Biocomputing, pp. 669-680, 1998.
40.Nasibov, E., and Tunaboylu, S., “Classification of splice-junction sequences via weighted position specific scoring approach,” Computational Biology and Chemistry, 34, pp. 293–299, 2010.
41.Niknam, T., Amiri, B., Olamaei, J., Arefi, A., “An efficient hybrid evolutionary optimization algorithm based on PSO and SA for clustering,” Journal of Zhejiang University Science A, 10 (4), pp. 512-519, 2009.
42.Orlovsky, S., “Decision-making with a fuzzy preference relation,” Fuzzy Sets and Systems, 1, pp. 155–167, 1978.
43.Provost, F. and Fawcett, T., “Robust classification for imprecise environments,” Machine Learning, 42, pp. 203-231, 2001.
44.Quinlan, J. R., “Programs for Machine Learning,” Morgan Kaufmann publishers, 1992.
45.Riddle, P., Segal, R., Etzioni, O., “Representation Design and Brute Force Induction in a Boeing Manufacturing Domain,” Applied Artificial Intelligence, 8, pp. 125-147, 1994.
46.Shi, Y., and Eberhart, R. C., “Parameter Selection in Particle Swarm Optimization,” Lecture Notes in Computer Science, 1447, Evolutionary Programming VII, Springer, Berlin, pp. 591-600, 1998.
47.Soleymani, S., Ranjbar, A. M., Bagheri Shouraki, S., Shirani, A. R., Sadati, N., “A new approach for bidding strategy of gencos using particle swarm optimization combined with simulated annealing method,” Iranian Journal of Science and Technology, Transaction B, Engineering, 31 (B3), pp. 303-315, 2007.
48.Su, C.-T., Chen, L.-S., Yih, Y., “Knowledge acquisition through information granulation for imbalanced data,” Expert System with Applications, 31, pp. 531-541, 2006.
49.Tsai, C. Y. and Shieh, Y. C., “A change detection method for sequential patterns,” Decision Support Systems, 46, pp. 501-511, 2009.
50.Wang, C. S. and Lee, A. J. T., “Mining inter-sequence patterns,” Expert Systems with Applications, 36, pp. 8649-8658, 2009.
51.Wang, X. and Zhang, H. “Handling Class Imbalance Problems via Weighted BP Algorithm,” Proceedings of ADMA, pp. 713-720, 2009.
52.Wasikowski, M. and Chen, X. W., “Combating the Small Sample Class Imbalance Problem Using Feature Selection,” IEEE Transactions on Knowledge and Data Engineer, 22 (10), pp. 1388-1400, 2010.
53.Wąż, D. B., Wąż, P., Clark, T., “Similarity studies of DNA sequences using genetic methods,” Chemical Physics Letters, 445, pp. 68-73, 2007.
54.Xie, X. F., Zhang, W. J., Yang, Z. L., “A Dissipative Particle Swarm Optimization,” Proceedings of the IEEE Congress on Evolutionary Computation, 2, pp. 1456-1461, 2002.
55.Yao, Y. Y. and Yao, J. T., “Granular computing as a basis for consistent classification problems,” Proceedings of PAKDD’02 Workshop on Toward the Foundation of Data Mining, pp. 101-106, 2002.
56.Zadeh, L. A., “Fuzzy sets and information granularity, in: Gupta, M. M., Ragade, R. K., and Yager, R. R. (Eds.),” Advances in Fuzzy Set Theory and Applications, North Holland, Amsterdam, pp. 3-18, 1979.
57.Zahiri, S. H., and Seyedin, S. A., “Swarm intelligence based classifiers,” Journal of Franklin Institute, 344, pp. 362-376, 2007.
58.Zaki, M. J., “Generating non-redundant association rules,” Science Department Rensselaer Polytechnic Institute, 2000.
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top