研究生(外文):Yi-Hsiung Su
論文名稱(外文):Using Text Mining to Create an Invalid Defect Classification Model for Server Development
指導教授(外文):Pin Luarn
口試委員(外文):Cheng-Kang ChenChien-Lung ChanWen-Chih LiaoPeter KY LoPin Luarn
外文關鍵詞:Invalid defectClassificationText miningData miningServer developmentProject managementBIOS
無效缺陷(Invalid defects)經常被忽視並且降低了開發生產力和效率。本研究使用探索性研究,專家會議和文本探勘研究方法,在三個研究階段回答四個研究問題。在第一階段,我們從三個x86伺服器專案的缺陷追踪系統中收集了3,347個缺陷。該階段研究發現伺服器產品的缺陷分佈,並且支援柏拉圖法则(Pareto principle)。在第二階段,我們從3347個缺陷中過濾了231個無效的BIOS(基本輸入/輸出系統)缺陷。這些缺陷被擁有眾多功能領域的台灣,中國和美國虛擬團隊發現。該階段研究結果表明BIOS韌體顯示最大數量的缺陷和無效缺陷。該韌體的缺陷和無效缺陷分別佔伺服器開發的43.4%的缺陷和33%的無效缺陷。結果確定了無效缺陷分類,包括四種類型,即按設計工作(WAD),用戶錯誤(User Error),重複(duplicate)和其他(Others)。所有這些類型可以分組在術語WUDO下。在WUDO分類中,WAD類型佔無效缺陷的最多比例45%。在第三階段,本研究確定了一種穩定的分類演算法,即決策樹C4.5,對無效缺陷類型進行分類。此研究對資訊科技產品的專案團隊,可以幫助開發人員和測試人員面臨的不同無效缺陷類型進行分類。結果可以提高專案團隊的生產力,降低專案管理的風險。
Invalid defects, which are often overlooked, reduce development productivity and efficiency. This study used exploratory study, expert meeting and text mining to answer four research questions in three research stages. In the first stage, we collected 3,347 defects from the defect tracking system of three x86 server projects. The study involves determining the defect distribution of server products, and it supports the Pareto principle. In the second stage, we filtered 231 invalid BIOS (basic input/output system) defects from the 3347 defects. These defects were from numerous function areas owned by virtual teams located in Taiwan, China, and the United States. Results indicated that BIOS firmware demonstrates the maximum number of defects and invalid defects. This firmware accounted for 43.4% defects and 33% invalid defects in server development. Results determined that invalid defect classification that includes four types, namely, working as designed (WAD), user error, duplicate, and others. All of these types can be grouped under the term WUDO. WAD accounts for the maximum of 45% of invalid defects in the WUDO classification. In the third stage, this study determined a stable classification algorithm, namely, decision tree C4.5, to classify the invalid defect types. This study helps project teams for information technology products to classify the different invalid defect types that developers and testers face. Results can improve project team productivity and mitigate project risks in project management.
論文摘要 I
誌謝 III
投稿 IV
1.1 Background and Motivation 1
1.2 Research Objectives 2
1.3 Research Process 5
1.4 Organization of Dissertation 6
2.1 Software Engineering and Project Management 7
2.2 Defects and Invalid Defects 9
2.3 Data Mining and Text Mining 12
2.4 Supervised Machine Learning 14
2.5 Algorithms 15
2.5.1 Decision Tree 15
2.5.2 Naive Bayes 16
2.5.3 Bayesian Network 16
2.5.4 Logistic Regression 16
2.5.5 Neural Network 17
Chapter 3 METHOD 19
3.1 Researched Case 19
3.2 Research Design 24
3.3 Definition of Invalid Defects 26
3.4 Data Collection and Extraction 28
3.5 Definition of GBI and EBI 30
3.6 Text-Mining Approaches 31
3.7 Evaluating Performance 35
4.1 Defect Distribution in Stage 1 37
4.2 Invalid Defect Classification in Stage 2 40
4.3 WUDO Classification 43
4.4 Invalid Defect Classification Model in Stage 3 49
Chapter 5 DISCUSSIONS 53
6.1 Conclusions 56
6.2 Theoretical and Practical Implications 59
6.3 Limitations and Directions for Future Research 60
Appendix A Keyword List 67
