跳到主要內容

臺灣博碩士論文加值系統

(98.82.120.188) 您好!臺灣時間:2024/09/09 04:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:張智傑
研究生(外文):Chih-ChiehChang
論文名稱:透過資料分群技術進行屬性延伸提升小樣本預測能力
論文名稱(外文):Using Data Clustering Techniques to Extend Attributes for Small Data Set Predictions
指導教授:利德江利德江引用關係
指導教授(外文):Der-Chiang Li
學位類別:博士
校院名稱:國立成功大學
系所名稱:工業與資訊管理學系碩博士班
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2012
畢業學年度:100
語文別:英文
論文頁數:66
中文關鍵詞:小樣本密度空間分群法K-means分群法屬性延伸
外文關鍵詞:Small data setDBSCANK-meansAttribute extension
相關次數:
  • 被引用被引用:0
  • 點閱點閱:339
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:1
近年來,小樣本的問題近年來不斷地被討論,包含一些罕見疾病、新產品之測試,如何在小樣本的情況下獲取更多的額外資訊成為很重要的研究議題。小樣本問題之困難點在於無法使用一般統計理論進行估計,而使得不論是在分類問題或是預測問題上都有相當的困難度。也由於所獲得的小樣本資料是非常的珍貴,必須要透過僅存的幾筆資料找出額外的資訊。本研究提出經由分群的方式進行屬性延伸,透過分群的方式可以發現資料內部所存在的資料結構。研究方法分為兩個步驟,首先利用原始資料透過分群技術將原始資料進行分群,第二個步驟,是將分群後產生的群組進行整體趨勢擴展功能建置各群組的隸屬函數。再計算原資料在隸屬函數中所對映的隸屬值,而此所對應的隸屬函數值即為新的屬性。透過結合舊有屬性與新生成的屬性可以產生新的資料集合。最後,再利用常見的預測模型(迴歸、支援向量迴歸以及倒傳遞類神經網路)進行驗證,比較原始資料與經過屬性延伸後的資料的預測能力。本研究選用密度空間分群法(DBSCAN)及K-means分群法進行分群,透過四個個案進行實驗驗證。結果顯示本研究所提出的方法在DBSCAN分群下不論是在預測誤差、標準差均能有效降低並且能夠有效提升小樣本的預測能力。
Small data set problems have been widely considered in many fields, where increasing the prediction ability is the most important goal. This study considers the data structure to identify new data points in a more precise manner, and is thus able to achieve improved prediction capability. The proposed method consists of two steps. The first step is using the clustering techniques to separate data sets into clusters. The second step is to build up the data attribute extension function, in which the new attributes are computed using fuzzy membership functions obtained by the corresponding membership grades in each cluster. This study applies density-based spatial clustering of applications with noise (DBSCAN) and K-means as clustering techniques. Four real cases are selected to compare the proposed forecasting model with the linear regression (LR), backpropagation neural network (BPNN), and support vector machine for regression (SVR) methods. The result show that the proposed method with DBSCAN clustering has better performance than when using the raw data with regard to the error improving rate, mean square error (MSE), and standard deviation (STD).

ABSTRACT I
中文摘要 II
誌謝 III
TABLE OF CONTENTS IV
LIST OF FIGURES V
LIST OF TABLES VI
Chapter 1 Introduction 1
1.1 Research Background and Motivation 1
1.2 Research Objective 3
1.3 Research Organization 6
Chapter 2 Literature Reviews 7
2.1.Small Data Sets 7
2.2 Clustering Techniques 9
2.3 Forecasting Models 13
Chapter 3 Methodology 17
3.1 Clustering and MTD function 17
3.2 Attribute extension 19
3.3 Steps 21
3.4 Examples 24
Chapter 4 Experiments 29
4.1 Reasmpling case 30
4.2 Cross validation case 35
Chapter 5 Conclusions 41
REFERENCES 43
APPENDIX 50

[1]Agrawal, R., Gehrke, J., Gunopulos, D., & Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications: ACM.
[2]Ali, S., & Smith-Miles, K. A. (2006). A meta-learning approach to automatic kernel selection for support vector machines. Neurocomputing, 70(1-3), 173-186.
[3]Amari, S. I., & Wu, S. (1999). Improving support vector machine classifiers by modifying kernel functions. Neural Networks, 12(6), 783-789.
[4]Ankerst, M., Breunig, M.M., Kriegel, H.P., & Sander, J. (1999). OPTICS: ordering points to identify the clustering structure. ACM SIGMOD’99, 28(2), 49-60.
[5]Anthony, M., & Biggs, N. (1997). Computational Learning Theory: An Introduction: Cambridge Univ Pr.
[6]Carrizosa, E., Martin-Barragan, B., & Morales, D. R. (2010). Binarized support vector machines. INFORMS Journal on Computing, 22(1), 154-167.
[7]Chao, G. Y., Tsai, T. I., Lu, T. J., Hsu, H. C., Bao, B. Y., Wu, W. Y., & Lu, T. L. (2011). A new approach to prediction of radiotherapy of bladder cancer cells in small dataset analysis. Expert Systems with Applications, 38(7), 7963-7969.
[8]Chehreghani, M.H. & Abolhassani, H. (2009). Density link-based methods for clustering web pages. Decision Support Systems, 47(4), 374-382.
[9]Cook, D. F., & Shannon, R. E. (1991). A sensitivity analysis of a back-propagation neural network for manufacturing process parameters. Journal of Intelligent Manufacturing, 2(3), 155-163.
[10]Daszykowski, M., Walczak, B., & Massart, D. (2001). Looking for natural patterns in data Part 1. Density-based approach. Chemometrics and Intelligent Laboratory Systems, 56(2), 83-92.
[11]Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 1-38.
[12]Drucker, H., Burges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1997). Support vector regression machines. Advances in neural information processing systems, 155-161.
[13]Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of Cybernetics, 4(1), 95-104.
[14]Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD’96, 226-231.
[15]Huang, C., & Moraga, C. (2004). A diffusion-neural-network for learning from small samples. International Journal of Approximate Reasoning, 35(2), 137-161.
[16]Jain, A. K. (2010). Data clustering: 50 years beyond K-means. Pattern Recognition Letters, 31(8), 651-666.
[17]Jain, A. K., & Dubes, R. C. (1988). Algorithms for Clustering Data: Prentice-Hall, Inc.
[18]Kaufman, L., & Rousseeuw, P. J. (1990). Finding Groups in Data: An Introduction to Cluster Analysis: Wiley Online Library.
[19]Khan, N., Ksantini, R., Ahmad, I., & Boufama, B. (2012). A novel SVM+ NDA model for classification with an application to face recognition. Pattern Recognition, 45(1), 66-79.
[20]Li, D. C., Chen, L. S., & Lin, Y. S. (2003). Using functional virtual population as assistance to learn scheduling knowledge in dynamic manufacturing environments. International Journal of Production Research, 41(17), 4011-4024.
[21]Li, D. C., Chen, W. C., Liu, C. W., Chang, C.J., & Chen, C.C. (2012a). Determining manufacturing parameters to suppress system variance using linear and non-linear models. Expert Systems with Applications, 39(4), 4020-4025.
[22]Li, D. C., Chen, W. C., Liu, C. W., & Lin, Y. S. (2012b). A non-linear quality improvement model using SVR for manufacturing TFT-LCDs. Journal of Intelligent Manufacturing, DOI: 10.1007/s10845-010-0440-1
[23]Li, D.C., & Fang, Y.H. (2009). A non-linearly virtual sample generation technique using group discovery and parametric equations of hypersphere. Expert Systems with Applications, 36(1), 844-851.
[24]Li, D.C., Fang, Y.H., & Fang, Y.M. (2010). The data complexity index to construct an efficient cross-validation method. Decision Support Systems, 50(1), 93-102.
[25]Li, D. C., Fang, Y. H., Liu, C. W., & Juang, C. (2012c). Using past manufacturing experience to assist building the yield forecast model for new manufacturing processes. Journal of Intelligent Manufacturing.
[26]DOI: 10.1007/s10845-010-0442-z
[27]Li, D. C., & Liu, C. W. (2012). Extending attribute information for small data set classification. IEEE Transactions on Knowledge and Data Engineering, 24(3), 452-464.
[28]Li, D. C., & Liu, C. W. (2009). A neural network weight determination model designed uniquely for small data set learning. Expert Systems with Applications, 36(6), 9853-9858.
[29]Li, D. C., Wu, C. S., Tsai, T. I., & Lina, Y. S. (2007). Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge. Computers & Operations Research, 34(4), 966-982.
[30]Li, D. C., & Yeh, C. W. (2008). A non-parametric learning algorithm for small manufacturing data sets. Expert Systems with Applications, 34(1), 391-398.
[31]Lin, C.Y., Chang, C.C., & Lin, C.C. (2005). A new density-based scheme for clustering based on genetic algorithm. Fundamenta Informaticae, 68(4), 315-331.
[32]Liao, T. W. (2011). Diagnosis of bladder cancers with small sample size via feature selection. Expert Systems with Applications, 38(4), 4649-4654.
[33]Liu, P., Zhou, D., & Wu, N. (2007). VDBSCAN: varied density based spatial clustering of applications with noise, Proceedings of IEEE International Conference on Service Systems and Service Management, 1–4.
[34]Muto, Y., & Hamamoto, Y. (2001). Improvement of the Parzen classifier in small training sample size situations. Intelligent Data Analysis, 5(6), 477-490.
[35]Niyogi, P., Girosi, F., & Poggio, T. (1998). Incorporating prior information in machine learning by creating virtual examples. Proceedings of the IEEE, 86(11), 2196-2209.
[36]Pascual, D., Pla, F., & Sanchez, J. (2006). Non parametric local density-based clustering for multimodal overlapping distributions. Intelligent Data Engineering and Automated Learning, 4224, 671-678.
[37]Pei, T., Jasra, A., Hand, D.J., Zhu, A.X., & Zhou, C. (2009). DECODE: a new method for discovering clusters of different densities in spatial data. Data Mining and Knowledge Discovery, 18(3), 337-369.
[38]Pelleg, D., & Moore, A. (1999). Accelerating Exact K-means Algorithms with Geometric Reasoning.
[39]Roy, S., & Bhattacharyya, D. (2005). An approach to find embedded clusters using density based techniques. Distributed Computing and Internet Technology, 3816, 523-535.
[40]Rumelhart, D. E., Hintont, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536.
[41]Sanchez, A., & David, V. (2003). Advanced support vector machines and kernel methods. Neurocomputing, 55(1-2), 5-20.
[42]Schölkopf, B., Smola, A., & Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319.
[43]Shawe-Taylor, J., Anthony, M., & Biggs, N. (1993). Bounding sample size with the Vapnik-Chervonenkis dimension. Discrete Applied Mathematics, 42(1), 65-73.
[44]Tsai, T.I., & Li, D.C. (2008). Utilize bootstrap in small data set learning for pilot run modeling of manufacturing systems. Expert Systems with Applications, 35(3), 1293-1300.
[45]Vapnik, V. N. (2000). The Nature of Statistical Learning Theory: Springer Verlag.
[46]Yang, J., Yu, X., Xie, Z. Q., & Zhang, J. P. (2011). A novel virtual sample generation method based on Gaussian distribution. Knowledge-Based Systems, 24(6), 740-748.
[47]Yang, T., & Kecman, V. (2009). Adaptive local hyperplane algorithm for learning small medical data sets. Expert Systems, 26(4), 355-359.
[48]Yeh, I.C. (2006). Exploring concrete slump model using artificial neural networks. Journal of Computing in Civil Engineering 20 (3), 217-221.
[49]Yu, J., Wang, Y., & Shen, Y. (2008). Noise reduction and edge detection via kernel anisotropic diffusion. Pattern Recognition Letters, 29(10), 1496-1503.
[50]Zorriassatine, F., & Tannock, J. (1998). A review of neural networks for statistical process control. Journal of Intelligent Manufacturing, 9(3), 209-224.



連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top