跳到主要內容

臺灣博碩士論文加值系統

(34.204.198.73) 您好!臺灣時間:2024/07/16 17:39
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:陳承輝
研究生(外文):Cheng-Hui Chen
論文名稱:基於多型態資料下的小樣本生成機制
論文名稱(外文):Designing a Generative Mechanism under Multi-Type Data with Limited Samples
指導教授:喻石生喻石生引用關係詹永寬詹永寬引用關係
指導教授(外文):Shyr-Shen YuYung-Kuan Chan
口試委員:王圳木林春宏呂慈純
口試日期:2023-12-28
學位類別:博士
校院名稱:國立中興大學
系所名稱:資訊工程學系所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2024
畢業學年度:112
語文別:英文
論文頁數:57
中文關鍵詞:不平衡資料生成機制多型態小樣本資料對抗式生成網路合成少數過採樣技術
外文關鍵詞:Imbalanced Data Generation MechanismMulti-type Limited Sample DataGenerative Adversarial Network (GAN)Synthetic Minority Over-sampling Technique (SMOTE)
相關次數:
  • 被引用被引用:0
  • 點閱點閱:24
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在過去的幾年中,處理小樣本的挑戰已成為科學討論的焦點。這種特殊的障礙在各種關鍵應用中表現得尤為突出。以設備故障預測診斷而言,設備的故障資料通常僅占整體運行的資料1%不到,這會造成正常運行與故障資料比例過大,導致故障預測任務失敗。此外,由於故障資料收集包含震動感測器、電流感測器、溫度感測及生產的製程配方等多種類型資料,使得小樣本的處理方法必須能夠適應更多種資料的型態。為此本篇提出基於多型態資料下的小樣本生成機制,解決現今常見的混合型及時間序列型等多型態的數據生成上的瓶頸。於實驗設計階段,以設備故障資料集作為驗證資料,經過小樣本的生成資料處理,接著輸入相同模型比較預測的結果,以此來比較不同生成機制的效果。結果指出,在混合數據類型方面,與其他相似的生成方法相比,當設備故障數據佔總數據不足1%時,幫助預測模型準確率提高了6.45%,這顯示提出方法能提升混和型小樣本資料類型的預測能力。此外,在時間序列型數據方面,與相似生成方法相比,幫助預測模型的F1分數高出2.21%,召回率達到95.51%。綜觀以上數據顯示,在本篇提出方法對於混合或時間序列兩種不平衡資料型態具有改善的效果。
In recent years, the scientific community has placed increased emphasis on the challenges of managing limited samples. This challenge becomes notably pronounced in essential applications such as equipment failure predictive diagnosis. Malfunction data in many cases represents less than 1% of the overall operational data, leading to significant imbalances and subsequent predictive task equipment failures. Given the diversity of malfunction data sources, which include vibration sensors, current sensors, temperature measurements, and production process formulas, there is a clear demand for methodologies that can accommodate various data types. To address this need, this dissertation presents a generative mechanism under multi-type data with limited samples. This mechanism seeks to overcome the common challenges associated with producing mixed-type and time-series data. Experimental validation conducted using an equipment malfunction dataset reveals that this approach improves diagnostic accuracy by 6.45% in mixed data scenarios where malfunction data accounts for less than 1% of the total. This result showcases the method's superior performance in situations with mixed-type limited samples. For time-series data, this methodology outperforms comparable techniques, delivering a 2.21% increase in the F1 score and a recall rate of 95.51%. Cumulatively, the findings demonstrate the effectiveness of the presented method in addressing data imbalances across both mixed and time-series modalities.
摘要 i
Abstract ii
Contents iii
List of Figures v
List of Tables vi
Chapter 1. Introduction 1
Chapter 2. Related Works 5
2.1. Generative Methods Based on Generative Adversarial Networks 5
2.1.1. Conditional Tabular Generative Adversarial Network (CTGAN ) 5
2.1.2. Wasserstein Generative Adversarial Network (WGAN) 6
2.2. Generative Methods Based on K-Nearest Neighbors (KNN) 7
2.2.1. SMOTE-NC (Synthetic Minority Over-Sampling Technique for Nominal and Continuous) 7
2.2.2. Borderline-SMOTE (Borderline-Synthetic Minority Oversampling Technique) 8
2.2.3. Borderline-SMOTE (Borderline-Synthetic Minority Oversampling Technique) 8
Chapter 3. Generative Mechanism under Mixed-Type Data with Limited Samples 10
3.1. Synthetic Minority Oversampling Technique-Nominal Continuous (SmoteNC) 12
3.2. Conditional Tabular Generative Adversarial Network (CTGAN ) 13
3.3. SmoteNC-CTGAN 14
Chapter 4. Generative Mechanism under Time Series Data with Limited Samples 16
4.1. Extract Time-Series Feature 17
4.2. Borderline-SMOTE 19
4.3. Wasserstein GAN 20
4.4. Borderline SMOTE-WGAN 21
Chapter 5. Experimental Setup 23
5.1. Validation of Generated Data 23
5.2. Evaluation Metrics 23
Chapter 6. Results and Discussion 26
6.1. Generative Mechanism under Mixed-Type Data with Limited Samples 27
6.1.1. Mixed-Type Dataset Description 27
6.1.2. Experiment Setting 28
6.1.3. Parameter Setting 29
6.1.4. Experiment Results (Mixed-Type Data) 30
6.1.5. Discussion 38
6.2. Generative Mechanism under Time Series Data with Limited Samples 40
6.2.1. Time Series Dataset Description 40
6.2.2. Parameter Setting 42
6.2.3. Evaluation of the Generated Data 43
6.2.4. Experiment Results (Time Series Data) 44
6.2.5. Cause Analysis 47
Chapter 7. Conclusions 49
References 51
[1]Kim, M., Cho, H., Choo, K.-B., Jiafeng, H., Jung, D.-W., Park, J.-H., Lee, J.-H., Jeong, S.-K., Ji, D.-H., & Choi, H.-S. (2022). Design of underwater thruster fault detection model based on vibration sensor data: Generative adversarial network-based fault data expansion approach for data imbalance. Sensors and Materials, 34(8), 3213. https://doi.org/10.18494/sam3991
[2]Chang, Y. I., Shei-Dei Yang, S., Chuang, Y. C., Shen, J. H., Lin, C. C., & Li, C. E. (2022). Automatic Classification of Uroflow Patterns via the Grading-based Approach. J. Inf. Sci. Eng, 463–477.
[3]Shen, J.-H., Chen, M.-Y., Lu, C.-T., & Wang, R.-H. (2022). Monitoring spatial keyword queries based on resident domains of mobile objects in IoT environments. Mobile Networks and Applications, 27(1), 208–218. https://doi.org/10.1007/s11036-020-01642-z
[4]Hu, Y., Liu, R., Li, X., Chen, D., & Hu, Q. (2022). Task-sequencing meta learning for intelligent few-shot fault diagnosis with limited data. IEEE Transactions on Industrial Informatics, 18(6), 3894–3904. https://doi.org/10.1109/tii.2021.3112504
[5]Singh, S. K., Kumar, S., & Dwivedi, J. P. (2017). Compound fault prediction of rolling bearing using multimedia data. Multimedia Tools and Applications, 76(18), pp.18771–18788. https://doi.org/10.1007/s11042-017-4419-1
[6]Jeong, S.; Shen, J.H.; Ahn, B. (2021). A study on smart healthcare monitoring using IoT based on blockchain. Wirel. Commun. Mob. Comput. pp. 1–9.
[7]Liu, J.C.; Yang, C.T.; Chan, Y.W.; Kristiani, E.; Jiang, W.J.(2021). Cyberattack detection model using deep learning in a network log system with data visualization. J. Supercomput. 77, pp. 10984–11003.
[8]Yang, C.T.; Liu, J.C.; Kristiani, E.; Liu, M.L.; You, I.; Pau, G. (2020). Netflow monitoring and cyberattack detection using deep learning with ceph. IEEE Access, 8, pp. 7842–7850.
[9]Wang, H.; Li, Z.; Wang, H. (2022). Few-shot steel surface defect detection. IEEE Trans. Instrum. Meas. 71, 1–12. https://doi.org/10.1109/tim.2021.3128208.
[10]Zhang, A.; Li, S.; Cui, Y.; Yang, W.; Dong, R.; Hu, J. (2019). Limited data rolling bearing fault diagnosis with few-shot learning. IEEE Access Pract. Innov. Open Solut. pp. 110895–110904. https://doi.org/10.1109/access.2019.2934233.
[11]Zhang, J.; Wang, Y.; Zhu, K.; Zhang, Y.; Li, Y. (2021). Diagnosis of interturn short-circuit faults in permanent magnet synchronous motors based on few-shot learning under a federated learning framework. IEEE Trans. Ind. Inform. , pp. 8495–8504. https://doi.org/10.1109/tii.2021.3067915.
[12]Zhang, T.; Chen, J.; He, S.; Zhou, Z. (2022) Prior knowledge-augmented self-supervised feature learning for few-shot intelligent fault diagnosis of machines. IEEE Trans. Ind. Electron. 69, pp. 10573–10584. https://doi.org/10.1109/tie.2022.3140403.
[13]Zhou, X.; Liang, W.; Shimizu, S.; Ma, J.; Jin, Q. (2021). Siamese Neural Network Based Few-Shot Learning for Anomaly Detection in Industrial Cyber-Physical Systems. IEEE Trans. Industr. Inform. 17, pp. 5790–5798. https://doi.org/10.1109/tii.2020.3047675.
[14]Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P.(2002). SMOTE: Synthetic Minority over-Sampling Technique. J. Artif. Intell. Res. 16, pp. 321–357. https://doi.org/10.1613/jair.953.
[15]Liu, H.; Zhao, H.; Wang, J.; Yuan, S.; Feng, W. (2021). LSTM-GAN-AE: A promising approach for fault diagnosis in machine health monitoring. IEEE Trans. Instrum. Meas. 71, 1–13. https://doi.org/10.1109/tim.2021.3135328.
[16]Moon, J.; Jung, S.; Park, S. (2020). Hwang, E. Conditional tabular GAN-based two-stage data generation scheme for short-term load forecasting. IEEE Access Pract. Innov. Open Solut., pp. 205327–205339. https://doi.org/10.1109/access.2020.3037063
[17]Dablain, D.; Krawczyk, B.; Chawla, N.V. (2019). DeepSMOTE: Fusing Deep Learning and SMOTE for Imbalanced Data. IEEE Trans. Neural Netw. Learn. Syst. , pp. 1–15. https://doi.org/10.1109/TNNLS.2021.3136503.
[18]Sharma, A.; Singh, P.K.; Chandra, R. (2022). SMOTified-GAN for Class Imbalanced Pattern Classification Problems. IEEE Access, 10, pp. 30655–30665. https://doi.org/10.1109/access.2022.3158977.
[19]Kim, J.; Park, H. (2021). OA-GAN: Overfitting Avoidance Method of GAN Oversampling Based on XAI. In Proceedings of the 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), Jeju Island, Korea, pp. 394–398.
[20]Mukherjee, M.; Khushi, M. (2021). SMOTE-ENC: A Novel SMOTE-Based Method to Generate Synthetic Data for Nominal and Continuous Features. Appl. Syst. Innov, 4, 18. https://doi.org/10.3390/asi4010018.
[21]Zinan, L.; Khetan, A.; Fanti, G.; Oh, S. (2017). PacGAN: The Power of Two Samples in Generative Adversarial Networks. arXiv 2017. https://doi.org/10.48550/ARXIV.1712.04086.
[22]Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. (2019). Modeling Tabular Data Using Conditional GAN., arXiv:1907.00503.
[23]Chunsheng, A.; Sun, J.; Wang, Y.; Wei, Q. (2017).A K-Means Improved CTGAN Oversampling Method for Data Imbalance Problem. In Proceedings of the 2021 IEEE 21st International Conference on Software Quality, Reliability and Security (QRS), Hainan, China; pp. 883–887.
[24]Arjovsky, M., Chintala, S., & Bottou, L. (2017). "Wasserstein Generative Adversarial Networks." In D. Precup & Y. W. Teh (Eds.), Proceedings of the 34th International Conference on Machine Learning (PMLR), vol. 70, pp. 214–223.
[25]Wang, Q., Zhou, X., Wang, C., Liu, Z., Huang, J., Zhou, Y., Li, C., Zhuang, H., & Cheng, J.-Z. (2019). "WGAN-based synthetic minority over-sampling technique: Improving semantic fine-grained classification for lung nodules in CT images." IEEE Access: Practical Innovations, Open Solutions, 7, 18450–18463. https://doi.org/10.1109/access.2019.2896409.Pradipta,
[26]G.A.; Wardoyo, R.; Musdholifah, A.; Sanjaya, I.N.H.; Ismail (2021). M. SMOTE for Handling Imbalanced Data Problem: A Review. In Proceedings of the 2021 Sixth International Conference on Informatics and Computing (ICIC), Jakarta, Indonesia; pp. 1–8.
[27]Han, Hui & Wang, Wen-Yuan & Mao, Bing-Huan. (2005). Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning. Adv Intell Comput. 3644. 878-887. 10.1007/11538059_91.
[28]Ning, Q.; Zhao, X.; Ma, Z. (2022). A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM Transactions on Computational Biology and Bioinformatics, vol. 19, no. 5, pp. 2632-2641. doi: 10.1109/TCBB.2021.3095482.
[29]Xu, Z.; Shen, D.; Nie, T.; Kou, Y. (2020).A Hybrid Sampling Algorithm Combining M-SMOTE and ENN Based on Random Forest for Medical Imbalanced Data. J. Biomed. Inform. 107, 103465. https://doi.org/10.1016/j.jbi.2020.103465.
[30]Christ, M., Kempa-Liehr, A. W., & Feindt, M. (2016). Distributed and parallel time series feature extraction for industrial big data applications. In arXiv [cs.LG]. http://arxiv.org/abs/1610.07717
[31]Adams, S., Meekins, R., Beling, P. A., Farinholt, K., Brown, N., Polter, S., & Dong, Q. (2017). A comparison of feature selection and feature extraction techniques for condition monitoring of a hydraulic actuator. Proceedings of the Annual Conference of the Prognostics and Health Management Society. Prognostics and Health Management Society. Conference, 9(1). https://doi.org/10.36001/phmconf.2017.v9i1.2452.
[32]Schneider, T., Helwig, N., & Schütze, A. (2017). Automatic feature extraction and selection for classification of cyclical time series data. Tm - Technisches Messen, 84(3), 198–206. https://doi.org/10.1515/teme-2016-0072
[33]Kennedy, A., Nash, G., Rattenbury, N. J., & Kempa-Liehr, A. W. (2021). Modelling the projected separation of microlensing events using systematic time-series feature engineering. Astronomy and Computing, 35(100460), 100460. https://doi.org/10.1016/j.ascom.2021.100460
[34]Teh, H. Y., Wang, K. I.-K., & Kempa-Liehr, A. W. (2021). Expect the unexpected: Unsupervised feature selection for automated sensor anomaly detection. IEEE Sensors Journal, 21(16), 18033–18046. https://doi.org/10.1109/jsen.2021.3084970
[35]Dorogush, A.V.; Ershov, V.; Gulin (2018), A. CatBoost, Gradient Boosting with Categorical Features Support. arXiv:1810.11363.
[36]Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna.(2019). A next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery Data Mining, Anchorage, AK, USA, ACM: New York, NY, USA.
[37]Powers, D. M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation." Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63.
[38]Henning, B.K.; Ong, C.S.; Stephan, K.E.; Buhmann. (2010). M.J. The Balanced Accuracy and Its Posterior Distribution. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey; pp. 3121–3124.
[39]Matthews, B.W. (1975). Comparison of the Predicted and Observed Secondary Structure of T4 Phage Lysozyme. Biochim. Biophys. Acta-(BBA)-Protein Struct. 405, pp. 442–451.
[40]Ioannis, M.; Rallis, I.; Georgoulas, I.; Kopsiaftis, G.; Doulamis, A.; Doulamis. (2021). N. Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem. Technologies, pp. 81. https://doi.org/10.3390/technologies9040081.
[41]Dua, D.; Graff, C. (2017). UCI Machine Learning Repository; University of California Irvine: Irvine, CA, USA.
[42]Matzka, S. (2020). Explainable Artificial Intelligence for Predictive Maintenance Applications. In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA; pp. 69–74.
[43]Mendel, J.M.; Bonissone (2021). Critical thinking about explainable AI (XAI) for rule-based fuzzy systems. IEEE Trans. Fuzzy Syst. Publ. IEEE Neural Netw. Counc. pp.3579–3593. https://doi.org/10.1109/tfuzz.2021.3079503.
[44]Helwig, N., Pignanelli, E., & Schutze, A. (2015). Condition monitoring of a complex hydraulic system using multivariate statistics. In 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) Proceedings , pp. 210–215.
[45]Medishetty, A. S., Muthavarapu, N. S., Goli, S. G., Sirisha, B., & Sandhya, B. (2021). Health monitoring of hydraulic system using feature-based multivariate time-series classification model. In Lecture Notes in Electrical Engineering ,Springer Singapore, pp. 81–92.
[46]Kim, K., & Jeong, J. (2020). Real-time monitoring for hydraulic states based on convolutional bidirectional LSTM with attention mechanism. Sensors (Basel, Switzerland), 20(24), 7099. https://doi.org/10.3390/s20247099.
[47]Chen, C.-H., Tsung, C.-K., & Yu, S.-S. (2022). Designing a hybrid equipment-failure diagnosis mechanism under mixed-type data with limited failure samples, Appl. Sci.-Basel, volume 12, issue 18, 9286. https://doi.org/10.3390/app12189286.
[48]Chen, C.-H., Chan, Y.-K., & Yu, S.-S. (2023). Time-series-based equipment failure diagnosis mechanism in the context of minority failure samples, Sens. Mater, volume 35, No. 1, pp.3537-3550. https://doi.org/10.18494/SAM4579.
[49]Chen, C.-H., Chan, Y.-K., & Yu, S.-S. (2023). Hydraulic System Failure Prediction Method with Limited Failure Data, Sixth International Symposium on Computer, Consumer and Control (IS3C), pp.171-173.
電子全文 電子全文(網際網路公開日期:20290103)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top