跳到主要內容

臺灣博碩士論文加值系統

(216.73.216.17) 您好!臺灣時間:2025/09/03 04:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:廖柏豪
研究生(外文):LIAO, PO-HAO
論文名稱:基於深度學習之大數據分析平台整合與優化
論文名稱(外文):Deep Learning Based Integration and Optimization of Big Data Analytics Platforms
指導教授:張保榮
指導教授(外文):CHANG, BAO-RONG
口試委員:梁財春王隆仁
口試委員(外文):LIANG, TSAIR-CHUNWANG, LUNG-JEN
口試日期:2018-07-31
學位類別:碩士
校院名稱:國立高雄大學
系所名稱:資訊工程學系碩士班
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2018
畢業學年度:106
語文別:英文
論文頁數:75
中文關鍵詞:巨量資料處理深度學習
外文關鍵詞:big data processingdeep learning
相關次數:
  • 被引用被引用:1
  • 點閱點閱:333
  • 評分評分:
  • 下載下載:7
  • 收藏至我的研究室書目清單書目收藏:1
本研究的目的著重於巨量資料分析工作排程機制,基於深度學習DNN對巨量資料分析所需時間做預測,並且透過智慧排程優化,縮減了整體工作的平均等待時間,期待本提議機制能大幅提升巨量資料分析平台執行的效能。並使用一套多重平台巨量資料處理系統,其具有高效能、高可用性及高擴展性的特色,透過整合Hadoop、Spark使平台支援R命令為主的資料分析能力。工作程式的時間複雜度、優先權與資料大小會影響整個執行工作的效能及完成工作平均等待時間,特别在大數據的環境下會拉長完成工作平均等待時間是個嚴酷的挑戰,因此設計優化排程來提升系統效才能解決問題。本研究透過深度類神經網路預測R程式所需執行的時間,再依Shortest Job First實現智慧排程,選擇最佳程式執行平台,以減少完成工作平均等待時間,達成多重大數據平台優化的目的。
This study focused on big data analysis job scheduling mechanism, predicting the time for big data analysis based on deep learning DNN (Deep Neural Network), and shortening the average waiting time of overall work by intelligent scheduling optimization. The proposed mechanism is expected to enhance the execution efficiency of big data analysis platform greatly. A multi-platform big data processing system, characterized by high efficiency, high availability and high expandability, is integrated with Hadoop and Spark to make the platform support R command-based data analysis capability. The time complexity, priority and data size of working program can influence the efficiency of overall execution work and the average waiting time for fulfilling the work, especially in the environment of big data, the average waiting time for fulfilling the work is prolonged. This problems can be solved only by designing optimal scheduling to enhance system effectiveness. This study uses DNN to predict the execution time for R program, and implements intelligent scheduling according to Shortest Job First, the optimal program execution platform is selected, so as to shorten the average waiting time for fulfilling the work to optimize the multiple big data platforms.
摘要 iii
ABSTRACT iv
誌謝 vi
1 Directory vii
2 List of Figures ix
3 List of Tables xi
Chapter 1. Introduction 1
Chapter 2. Literature Review 6
2.1 MapReduce/HDFS 9
2.2 Spark 11
2.3 Keras 13
2.4 Tensorflow 14
Chapter 3. Research Method 14
3.1 Array virtualized server environment 16
3.2 Intelligent scheduling and DNN time prediction model building 19
3.2.1 DNN time prediction model and intelligent scheduling architecture 19
3.2.2 Train DNN model 20
3.2.3 Time complexity predictor 21
3.3 Integrate multiple big data analysis platforms 22
3.4 Automatic platform selection program 23
3.4.1 Obtain memory information from other servers 24
3.5 Intelligent scheduling optimization 25
3.6 Data preprocessing 28
Chapter 4. Experimental Results and Discussion 28
4.1 Deep network model building 28
4.2 Experimental environment 30
4.2.1 Generate test data and design experimental environment 31
4.2.2 Experimental Results 33
4.2.2.1 Experimental environment I 34
4.2.2.2 Experimental environment II 36
4.2.2.3 Experimental environment III 39
4.2.3 Generate mixed instruction test data and design experimental environment 42
4.2.3.1 Experimental Results 43
4.2.3.2 Experimental environment I 44
4.2.3.3 Experimental environment II 45
4.2.3.4 Experimental environment III 47
4.2.4 Extract real data and design experimental environment 48
4.2.4.1 Experimental Results 54
4.3 Results and Discussion 59
Chapter 5. Conclusion 59
References 61

[1]H.-C. Chen, R. H. L. Chiang, and V. C. Storey, “Business Intelligence and Analytics: From Big Data to Big Impact,” MIS Quarterly, Vol. 36, No. 4, pp. 1165-1188, December, 2012.
[2]C. D. Wickens, “Processing Resources in Attention Dual Task Performance and Workload Assessment,” Office of Naval Research Engineering Psychology Program, No. N-000-14-79-C-0658, July, 1981.
[3]A. Hochstein, A. Schwinn, and W. Brenner, “Business Opportunities with Web Services in the Case of Ebay,” 2009 HICSS '09 42nd Hawaii International Conference on System Sciences, pp. 1-7, Jan. 5-8, 2009.
[4]M. Eisenstein, “Big data: The power of petabytes,” Nature Archive, Vol. 527, Iss. 7576, pp. S2-S4, November, 2015.
[5]中華電信研究院 - http://www.chttl.com.tw/web/ch/aboutus/aboutus_03_007.html
[6]S. Chaudhuri, U. Dayal, and V. Narasayya, “An overview of business intelligence technology,” Communications of the ACM, Vol. 54, Iss. 8, pp. 88-98, August, 2011.
[7]A. Thuso, “Hive - a petabyte scale data warehouse using Hadoop,” 2010 IEEE 26th International Conference on Data Engineering, pp. 996-1005, March 1-6, 2010.
[8]A. Kamburov,R. Cavill,T. M.D.Ebbels,R. Herwig1, and H. C. Keun, “Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA,” Bioinformatics, Vol. 27, Iss. 20, pp. 2917-2918, September, 2011.
[9]M. Armbrust, R. S. Xin, C. Lian, Y. Huai, D. Liu, J. K. Bradley, X. Meng, T. Kaftan, M. J. Franklin, A. Ghodsi, and M. Zaharia, “Spark SQL: Relational Data Processing in Spark,” SIGMOD '15 Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383-1394, May 31-June 04, 2015.
[10]K. Kambatla, A. Pathak, and H. Pucha, “Towards Optimizing Hadoop Provisioning in the Cloud,” Proceeding of the 2009 conference on Hot topics in cloud computing, Paper ID: 22, 2009.
[11]Apache Spark - https://spark.apache.org/
[12]M. Maurya, and S. Mahajan, “Performance analysis of MapReduce programs on Hadoop cluster,” Proceeding of World Congress on Information and Communication Technologies, pp. 505-510, 2012.
[13]S. Hong,“Optimal server allocation for real time computing systems with bursty priority jobs,”1995 Proceedings Second International Workshop on Real-Time Computing Systems and Applications, pp. 218-223, Oct. 25-27, 1995.
[14]A. K. Karun, and K. Chitharanjan, “A review on hadoop — HDFS infrastructure extensions,” Information & Communication Technologies (ICT) 2013 IEEE Conference on, pp. 132-137, 2013.
[15]C. Yao, D. Agrawal, G. Chen, B. C. Ooi, and S. Wu,“Adaptive Logging for Distributed In-memory Databases,” 2015. http://arxiv.org/pdf/1503.03653.pdf
[16]F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” In proceedings of Operating Systems Design and Implementation (OSDI), pp. 205-218, 2006.
[17]S. Ghemawat, H. Gobioff, and S. T. Leung, “The Google File System,” ACM SIGOPS Operating Systems Review - SOSP '03 , vol. 37, no. 5, pp. 29-43, 2003.
[18]G. DeCandia, “Dynamo: amazon's highly available key-value store,” In Proc. SOSP, pp. 205-220,2007.
[19]M. Stonebraker, “SQL Databases v. NoSQL Databases,” Commun. ACM, vol. 53, pp. 10-11, 2010.
[20]Menon, “Big data@ facebook,” In Proc. of the 2012 workshop on Management of big data systems, pp. 31-32, 2012.
[21]M. Meeteren, “Mapping communities in large virtual social networks: Using Twitter data to find the Indie Mac community,” IEEE International Workshop on Business Applications of Social Network Analysis, pp. 1-8, 2010.
[22]A. A. Sawant, and A. Bacchelli, “A Dataset for API Usage,” 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories, pp. 506-509, May 16-17, 2015.
[23]國家高速網路與計算中心 - https://www.nchc.org.tw/tw/inner.php?CONTENT_ID=744
[24]T.-C. Yen, F. Hsu, and S.-J. Kao, “XLCMS — a scalable and distributive linux virtual cluster management system,” 2009 International Conference on Information Networking, pp. 1-5, Jan. 21-24, 2009.
[25]Z. Xu, L. Yang, and J. Lei, “Conception and Design of Desktop Virtualization Cloud Platform for Primary Education: Based on the Citrix Technology,” 2015 International Conference of Educational Innovation through Technology (EITT), pp. 226-230, Oct. 16-18, 2015.
[26]Hue - http://gethue.com/
[27]F. Li, S. Zhan, and L. Li, “Research on using memcached in call center,” 2011 International Conference on Computer Science and Network Technology (ICCSNT), Vol. 3, pp. 1721-1723, Dec. 24-26, 2011.
[28]J. Howarth, I. Altas, and B. Dalgarno, “Information Flow Control Using the Java Virtual Machine Tool Interface (JVMTI),” 2010 ARES '10 International Conference on Availability, Reliability, and Security, pp. 689-695, Feb. 15-18, 2010.
[29]L. Lu, G. Cen, W. Gao, Q. Wang, J. Zhao, and J. Du, “A research of information management system solution base on Centos & Oracle,” 2010 World Automation Congress (WAC), pp. 309-312, Sept. 19-23, 2010.
[30]M. Adnan, M. Afzal, M. Aslam, R. Jan, and A. M. Martinez-Enriquez, “Minimizing big data problems using cloud computing based on Hadoop architecture,” 2014 11th Annual High Capacity Optical Networks and Emerging/Enabling Technologies (Photonics for Energy), pp. 99-103, Dec. 15-17, 2014.
[31]U. Dadi, and L. Di, “Creating web service interfaces and scientific workflows using command line tools: A GRASS example,” 2009 17th International Conference on Geoinformatics, pp. 1-6, Aug. 12-14, 2009.
[32]Y. Bai, “JDBC API and JDBC Drivers,” Practical Database Programming with Java, pp. 89-111, 2011.
[33]Jayati, “The Berkeley Data Analytics Stack (BDAS),” 2014 Conference on IT in Business, Industry and Government (CSIBIG), pp. 1-1, March 8-9, 2014.
[34]Proxmox Virtual Environment - https://pve.proxmox.com/
[35]B. R. Chang, H. F. Tsai, and Y. C. Tsai, “High-Performed Virtualization Services for In-Cloud Enterprise Resource Planning System,” Journal of Information Hiding and Multimedia Signal Processing, vol. 5, no. 4, pp. 614-624, 2014.
[36]IEEE P802.3ad Link Aggregation Task Force - http://grouper.ieee.org/groups/802/3/ad/index.html
[37]Cloudera Distribution Including Apache Hadoop - http://www.cloudera.com/
[38]H. Topcuoglu, S. Hariri, and M. Y. Wu, “Performance-effective and low-complexity task scheduling for heterogeneous computing,” IEEE Transactions on Parallel and Distributed Systems, vol. 13, no. 3, pp. 260-274, 2002.
[39]Saiku - http://www.meteorite.bi/products/saiku
[40]Mondrian - http://community.pentaho.com/projects/mondrian/
[41]J. Kempf, Y. Zhang, R. Mishra, and N. Beheshti, “Zeppelin - A third generation data center network virtualization technology based on SDN and MPLS,” 2013 IEEE 2nd International Conference on Cloud Networking (CloudNet), pp. 1-9, Nov. 11-13, 2013.
[42]Z. Han and Y. Zhang, “Spark: A Big Data Processing Platform Based on Memory Computing,” 2015 Seventh International Symposium on Parallel Architectures, Algorithms and Programming (PAAP), pp. 172-176, Dec. 12-14, 2015.

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top