跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.84) 您好!臺灣時間:2024/12/03 11:14
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:陳昱丞
研究生(外文):Chen, Yu-Cheng
論文名稱:雲端上的商業智慧 - 不同資訊揭露程度的預測模型整合模式
論文名稱(外文):Business Intelligence in the Cloud – Model Integration Methods under Different Information Disclosure
指導教授:魏志平魏志平引用關係林福仁林福仁引用關係
指導教授(外文):Wei, Chih-PingLin, Fu-Ren
口試委員:楊錦生陳宏鎮
口試日期:2011-7-28
學位類別:碩士
校院名稱:國立清華大學
系所名稱:服務科學研究所
學門:商業及管理學門
學類:其他商業及管理學類
論文種類:學術論文
論文出版年:2011
畢業學年度:99
語文別:英文
論文頁數:45
中文關鍵詞:異質模型分類器整合機器學習
外文關鍵詞:Heterogeneous ModelsClassifier IntegrationMachine Learning
相關次數:
  • 被引用被引用:0
  • 點閱點閱:1120
  • 評分評分:
  • 下載下載:129
  • 收藏至我的研究室書目清單書目收藏:1
隨著近年來雲端運算的興起以及廣泛的應用,針對同一個預測問題可能會有許多不同的預測模型同時存在於雲端上提供人們選用。因此,提供使用者一個服務來整合各個不同預測模型結果的需求也正在浮現。然而針對同一個預測問題,不同的研究機構對於問題所採用的觀點、能力、或是資源的差異都有可能造成這些預測模型使用不盡相同的屬性集合。此外,這些預測模型也可能處於不同的資訊揭露程度,也就是說有些預測模型可能會提供一些額外的模型資料。雖然先前的研究曾經提出一些整合的方式,不過這些研究都需要一個完整的單一資料集合來建立並整合相關的模型,但是這些作法並不適用於雲端上的模型整合問題。所以一個能夠整合多個異質模型的作法是迫切需要的。首先,當每個模型都提供相同的資訊揭露程度時,本研究針對四種不同的資訊揭露程度下的預測模型提出相對應的整合方式,我們會透過這些模型所提供的資訊分別進行不同的權重調整,進而整合這些模型以取得最後的整合結果。但是由於每個模型所提供的資訊揭露程度可能不盡相同,本研究也提出兩種相對應的解決方法,以滿足不同的服務需求。實驗的結果顯示出我們所提供的整合方法都能夠勝過挑選單一模型的作法,而我們也發現如果預測模型的提供者能夠提供愈多關於模型本身的資訊,那麼整合服務提供者就能夠更有效地利用這些資訊來提昇預測服務的準確度,而我們所提出方法在不同情境下的通用性以及適用性也被證實是有效的。最後,當每個預測模型都提供不同的揭露資訊時,使用每個預測模型共同擁有的資訊來進行整合會是比較好選擇。
In recent years, with the advent and adoption of cloud computing, different prediction models provided to deal with the same prediction task are simultaneously available on the cloud for companies and individuals. Therefore, the service to provide integrated prediction results from these prediction models is emerging. However, the attributes involved in the models for the same prediction task may be much different due to different perspectives, capabilities, or resources of the model providers. Moreover, these models may also provide different model information, i.e., under different information disclosure level. Although some model integration methods have been proposed in the prior studies, these methods are based on the assumption that the complete data source is available for training. Such assumption is not tenable in our mentioned scenario. To address this challenge, novel model integration methods are therefore necessary. In response, we first propose four model integration methods to deal with the models under a given level of information disclosure by adopting a corresponding measure for determining the weight of each involved model. Furthermore, we propose two model integration methods to deal with the models under different information disclosure levels. A series of experiments are performed to demonstrate that our proposed model integration methods can outperform the benchmark, i.e., the enhanced model selection method. Our experimental results suggest that the accuracy of the integrated predictions can be improved if the stakeholders ask all the model providers to release more model information. The generalizability and applicability of our proposed method is also proved. Finally, when the models are under different information disclosure levels, the recommended way of integration is to only use the common model information for reference.
CHAPTER 1 INTRODUCTION 1
1.1 Background 1
1.2 Motivation 3
1.3 Objective 4
CHAPTER 2 LITERATURE REVIEW 6
2.1 Classifier Integration 6
2.1.1 Voting 7
2.1.2 Bagging 8
2.1.3 Boosting 9
2.1.4 Stacking 10
2.1.5 Cascading 11
CHAPTER 3 INTEGRATION METHODS FOR MODELS UNDER SAME INFORMATION DISCLOSURE 13
3.1 Scenario 1 14
3.2 Scenario 2 15
3.3 Scenario 3 15
3.4 Scenario 4 16
CHAPTER 4 INTEGRATION METHODS FOR MODELS UNDER DIFFERENT INFORMATION DISCLOSURE 18
4.1 Conservative method 18
4.2 Aggressive method 19
4.2.1 Prediction of training accuracy 19
4.2.2 Prediction of attributes rankings 20
CHAPTER 5 EMPIRICAL EVALUATION 25
5.1 Data Collection 25
5.2 Evaluation Procedure 25
5.2.1 Training Data Preparation 26
5.2.2 Testing Data Preparation 28
5.3 Empirical Evaluation 29
5.3.1 Evaluation Criteria 29
5.3.2 Benchmarks 29
5.3.3 Evaluation Results 30
5.4 Evaluation Results 30
5.4.1 Comparison Evaluation Results of Same Level Scenarios 30
5.4.2 Comparison Evaluation Results of Mixed Scenario 37
CHAPTER 6 CONCLUSION AND FUTURE WORK 40
REFERENCES 42

Alpaydin, E. (2010). Introduction to Machine Learning, The MIT Press.
Alpaydin, E., & Kaynak, C. (1998). Cascading classifiers. Kybernetika, 34, 369-374.
Bauer, E., & Kohavi, R. (1999). An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants. Machine Learning, 36(1), 105-139. doi: 10.1023/a:1007515423169
Bellotti, T., & Crook, J. (2011). Loss given default models incorporating macroeconomic variables for credit cards. International Journal of Forecasting, In Press, Corrected Proof, Available online 12 February 2011, ISSN 0169-2070, DOI: 10.1016/j.ijforecast.2010.08.005.
Boyd, R. (2011). Make your Business App Intelligent with the Google Prediction API, from http://googleappsdeveloper.blogspot.com/2011/06/make-your-business-app-intelligent-with.html
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123-140.
Chen, L., Lin, Lu, Kairui, Feng, Wenjin, Li, Jie, Song, Lulu, Zheng, Youlang, Yuan, Zhenbin, Zeng, Kaiyan, Feng, Wencong, Lu, Yudong, Cai, (2009). Multiple classifier integration for the prediction of protein structural classes. Journal of Computational Chemistry, 30(14), 2248-2254. doi: 10.1002/jcc.21230
Chibelushi, C. C., Deravi, F., & Mason, J. S. D. (1999). Adaptive classifier integration for robust pattern recognition. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 29(6), 902-907.
Dunn, L. F., & Kim, T. H. (1999). An empirical investigation of credit card default. Working Papers.
Freund, Y., & Schapire, R. (1995). A desicion-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory. Springer Berlin / Heidelberg. 904: 23-37
Gaffney, C. (2008). Detecting Trends in the Prediction of the Buried Past: A Review of Geophysical Techniques in Archaeology. Archaeometry, 50(2), 313-336. doi: 10.1111/j.1475-4754.2008.00388.x
Hsieh, W. K., Liu, S. M., & Hsieh, S. Y. (2006). Hybrid Neural Network Bankruptcy Prediction: An Integration of Financial Ratios, Intellectual Capital Ratios, MDA and Neural Network Learning. Joint Conference on Information Science. doi:10.2991/jcis.2006.323
Kim, E., & Ko, J. (2005). Dynamic classifier integration method. Multiple Classifier Systems, 97-107.
Kim, J., Won, C., & Bae, J. K. (2010). A knowledge integration model for the prediction of corporate dividends. Expert Systems with Applications, 37(2), 1344-1350. doi: 10.1016/j.eswa.2009.06.035
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 20(3), 226-239.
Koh, H. C., Tan, W. C., & Goh, C. P. (2006). A two-step method to construct credit scoring models with data mining techniques. International Journal of Business and Information, 1(1), 96-118.
Lin, H.-E., Zito, R., & Taylor, M. A. P. (2005). A review of travel-time prediction in transport and logistics. Proceedings of the Eastern Asia Society for transportation studies, 5, 1433-1448.
Noble, W. S., & Ben-Hur, A. (2008). Integrating Information for Protein Function Prediction. Bioinformatics - From Genomes to Therapies, Wiley-VCH Verlag GmbH: 1297-1314.
Pereira, R., & Seabra Lopes, L. (2009). Learning Visual Object Categories with Global Descriptors and Local Features. Progress in Artificial Intelligence, 225-236.
Quinlan, J. (1993). C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers Inc.
Ravi Kumar, P., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review. European Journal of Operational Research, 180(1), 1-28. doi: 10.1016/j.ejor.2006.08.043
Re, M., & Valentini, G. (2010). Integration of heterogeneous data sources for gene function prediction using decision templates and ensembles of learning machines. Neurocomputing, 73(7-9), 1533-1537. doi: 10.1016/j.neucom.2009.12.012
Re, M., & Valentini, G. (2011). Ensemble methods: a reviewData Mining and Machine Learning for Astronomical Applications: Chapman & Hall. Retrieved from http://eprints.pascal-network.org/archive/00007721/01/ens.review.revised.pdf.
Reyes, E. P. (2010). A systems thinking approach to business intelligence solutions based on cloud computing, from http://dspace.mit.edu/handle/1721.1/59267
Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197-227.
Steenackers, M. (1989). A credit scoring model for personal loans. Insurance: Mathematics and Economics, 8(1), 31-34.
Tsai, C.-F., Hsu, Y.-F., Lin, C.-Y., & Lin, W.-Y. (2009). Intrusion detection by machine learning: A review. Expert Systems with Applications, 36(10), 11994-12000. doi: 10.1016/j.eswa.2009.05.029
Werner, M., & Whitfield, D. (2007). On model integration in operational flood forecasting. Hydrological Processes, 21(11), 1519-1521. doi: 10.1002/hyp.6726
Wolpert, D. H. (1992). Stacked generalization*. Neural networks, 5(2), 241-259.
Yang, C. S., Wei, C. P., Yuan, C. C., & Schoung, J. Y. (2010). Predicting the length of hospital stay of burn patients: Comparisons of prediction accuracy among different clinical stages. Decision Support Systems, 50(1), 325-335.
Yeh, I. (2009). The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications, 36(2), 2473-2480.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top