( 您好!臺灣時間:2021/03/02 09:53
字體大小: 字級放大   字級縮小   預設字形  


論文名稱(外文):Deploying Human Resource Fuzzy Query Engine Based on Hadoop
中文關鍵詞:雲端運算HadoopMapReduceFuzzy Query
外文關鍵詞:Cloud computingHadoopMapReduceFuzzy Query
  • 被引用被引用:1
  • 點閱點閱:352
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
雲端運算是泛指使用者可以隨時隨地使用任何裝置透過網路存取各種服務,它能提供處理大量資料的能力、節省企業成本等優點。雲端運算依照服務模式可以區分為IaaS(Infrastructure as a Service)、PaaS(Platform as a Service)、SaaS(Software as a Service)三大類。Hadoop是一套可以架設雲端服務的自由軟體之一,擁有儲存和處理大量資料的能力,並且可以提供可靠且有效率的服務。開發者也可以藉由Hadoop所提供的MapReduce框架方便建構雲端運算環境,而不用去理會複雜的底層結構。
由於一般傳統的人力銀行只能搜尋條件完全符合的結果,但往往使用者的需求是不夠明確的,為了解決此問題本實驗室開發了Fuzzy Query Engine,它能查詢出所有可能的結果並附上符合程度讓使用者方便參考。延續之前的研究成果,本研究主要是建構Fuzzy Query的雲端運算環境,依據SaaS的概念讓一般傳統的人力銀行能很容易地處理大量資料的模糊查詢運算。

Cloud computing means the user can use any equipment to go online to access service at anytime, anywhere. The advantage of cloud computing is that it provides the ability of huge amount of data processing and cost saving. There are three kinds of cloud computing, IaaS(Infrastructure as a Service)、PaaS(Platform as a Service) and SaaS(Software as a Service). Hadoop is an open source for constructing cloud services. Hadoop provides reliable and efficient ways to developing cloud computing services for data processing. Developer can use the MapReduce frame provided by Hadoop to construct the cloud computing environment conveniently without knowing the complex basic structure.
Traditional Human Resources websites only search for full match results, but usually the requests of the users are not precise. Therefore, software engineering laboratory at National Changhua University of Education develops a Fuzzy Query Engine to solve this problem. This engine searches for all the possible results and returns the ranks for searching results based on users’ requests. Continuing previous researches, our research focus mainly on constructing a Fuzzy Query cloud computing environment which is based on the concept of SaaS that allow traditional Human Resources websites to apply this provided fuzzy searching service to their web pages easily.

中文摘要 I
Abstract II
誌謝 III
目錄 IV
圖目錄 V
表目錄 VI
第一章 緒論 1
1.1研究背景與動機 1
1.2研究目的 2
1.3論文架構 2
第二章 背景知識 3
2.1雲端運算 3
2.2 Hadoop 6
2.3 Hadoop Distributed File System 7
2.4 MapReduce 9
第三章 相關文獻探討 10
3.1 Hadoop的學術研究與應用 10
3.2 Hadoop Ecosystem 12
第四章 人力資源網站模糊查詢系統 14
第五章 研究方法 17
5.1平台的架設 17
5.2整合Hadoop與模糊查詢功能 19
第六章 實驗結果與討論 24
6.1 Map數量對效能的影響 25
6.2電腦數量對效能的影響 27
6.3模糊查詢條件對效能的影響 29
6.4實驗分析 30
第七章 結論與未來展望 32
參考文獻 33

圖1.雲端運算示意圖[55] 4
圖2.四個佈署模型[52] 4
圖3.三個服務模式[52] 5
圖4.Hadoop軟體架構中的不同角色[55] 8
圖5.Client從HDFS讀取資料[25] 8
圖6.Client寫入資料到HDFS[25] 8
圖7.MapReduce運作過程[52] 9
圖8.Hive的架構[24] 12
圖9.Pig運作流程 13
圖10.運用ZooKeeper解決分佈一致性的問題 13
圖11.人力資源網查詢頁面 14
圖12.人力資源網確認查詢頁面 15
圖13.運算前的txt檔案 16
圖14.運算後的txt檔案 16
圖15.人力資源網執行結果畫面 16
圖16.人力銀行模糊查詢之雲端運算示意圖 18
圖17.hdfs-site.xml程式碼片段 19
圖18.mapred-site.xml程式碼片段 19
圖19.檔案儲存在HDFS示意圖 20
圖20.Hadoop切割檔案後每個Map所負責的區域 21
圖21.正確的一筆資料 21
圖22.沒切割好的資料 21
圖23.MapReduce運作流程 23
圖24.相同電腦數、不同Map數量比較(資料量小) 26
圖25.相同電腦數、不同Map數量比較(資料量中) 26
圖26.相同電腦數、不同Map數量比較(資料量大) 27
圖27.相同Map數、不同電腦數量比較(資料量小) 28
圖28.相同Map數、不同電腦數量比較(資料量中) 28
圖29.相同Map數、不同電腦數量比較(資料量大) 29
圖30.相同電腦數和Map數量、不同模糊查詢條件比較 30

表1.實驗設備 18
表2.不同資料筆數在不同條件選項下的符合筆數 25
1. S. Aggarwal, S. Phadke, and M. Bhandarkar, Characterization of Hadoop Jobs Using Unsupervised Learning, in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on. 2010. p. 748-753.
2. M. Burrows, The Chubby lock service for loosely-coupled distributed systems, in Proceedings of the 7th symposium on Operating systems design and implementation. 2006, USENIX Association: Seattle, Washington. p. 335-350.
3. M. Cafarella and D. Cutting, Building Nutch: Open Source Search, in Queue. 2004. p. 54-61.
4. T. Condie, N. Conway, P. Alvaro, J.M. Hellerstein, K. Elmeleegy, and R. Sears, MapReduce online, in Proceedings of the 7th USENIX conference on Networked systems design and implementation. 2010, USENIX Association: San Jose, California. p. 21-21.
5. J. Dean and S. Ghemawat, MapReduce: simplified data processing on large clusters, in Commun. ACM. 2008. p. 107-113.
6. J. Ekanayake, S. Pallickara, and G. Fox, MapReduce for Data Intensive Scientific Analyses, in Proceedings of the 2008 Fourth IEEE International Conference on eScience. 2008, IEEE Computer Society. p. 277-284.
7. S. Ghemawat, H. Gobioff, and S.-T. Leung, The Google file system, in SIGOPS Oper. Syst. Rev. 2003. p. 29-43.
8. P. Hunt, M. Konar, F.P. Junqueira, and B. Reed, ZooKeeper: wait-free coordination for internet-scale systems, in Proceedings of the 2010 USENIX conference on USENIX annual technical conference. 2010, USENIX Association: Boston, MA. p. 11-11.
9. X. Jiong, Y. Shu, R. Xiaojun, D. Zhiyang, T. Yun, J. Majors, A. Manzanares, and Q. Xiao, Improving MapReduce performance through data placement in heterogeneous Hadoop clusters, in Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE International Symposium on. 2010. p. 1-9.
10. U. Kang, C.E. Tsourakakis, and C. Faloutsos, PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations, in Proceedings of the 2009 Ninth IEEE International Conference on Data Mining. 2009, IEEE Computer Society. p. 229-238.
11. R.T. Kaushik, M. Bhandarkar, and K. Nahrstedt, Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System, in Cloud Computing Technology and Science (CloudCom), 2010 IEEE Second International Conference on. 2010. p. 274-287.
12. A. Kimball, S. Michels-Slettvet, and C. Bisciglia, Cluster computing for web-scale data processing, in SIGCSE Bull. 2008. p. 116-120.
13. S.Y. Ko, I. Hoque, B. Cho, and I. Gupta, On availability of intermediate data in cloud computations, in Proceedings of the 12th conference on Hot topics in operating systems. 2009, USENIX Association: Monte Verit\&\#224;, Switzerland. p. 6-6.
14. L.F. Lai, C.C. Wu, L.T. Huang, and J.C. Kuo, A Fuzzy Query Mechanism for Human Resource Websites, in Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence. 2009, Springer-Verlag: Shanghai, China. p. 579-589.
15. L.F. Lai, C.C. Wu, M.Y. Shih, L.T. Huang, and W. Chiou, Parallel Processing for Fuzzy Queries in Human Resources Websites, in Journal of Internet Technology. 2010. p. 943-953.
16. J. Leverich and C. Kozyrakis, On the energy (in)efficiency of Hadoop clusters, in SIGOPS Oper. Syst. Rev. 2010. p. 61-65.
17. P. Mell and T. Grance, The NIST Definition of Cloud Computing (Draft). 2011.
18. B. Nicolae, D. Moise, G. Antoniu, L. Bouge, and M. Dorier, BlobSeer: Bringing high throughput under heavy concurrency to Hadoop Map-Reduce applications, in Parallel & Distributed Processing (IPDPS), 2010 IEEE International Symposium on. 2010. p. 1-11.
19. C. Olston, B. Reed, U. Srivastava, R. Kumar, and A. Tomkins, Pig latin: a not-so-foreign language for data processing, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data. 2008, ACM: Vancouver, Canada. p. 1099-1110.
20. B. Panda, J.S. Herbach, S. Basu, and R.J. Bayardo, PLANET: massively parallel learning of tree ensembles with MapReduce, in Proc. VLDB Endow. 2009. p. 1426-1437.
21. S. Papadimitriou and J. Sun, DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining, in Proceedings of the 2008 Eighth IEEE International Conference on Data Mining. 2008, IEEE Computer Society. p. 512-521.
22. K. Shvachko, H. Kuang, S. Radia, and R. Chansler, The Hadoop Distributed File System, in Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). 2010, IEEE Computer Society. p. 1-10.
23. A. Thusoo, J.S. Sarma, N. Jain, S. Zheng, P. Chakka, Z. Ning, S. Antony, L. Hao, and R. Murthy, Hive - a petabyte scale data warehouse using Hadoop, in Data Engineering (ICDE), 2010 IEEE 26th International Conference on. 2010. p. 996-1005.
24. A. Thusoo, J.S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, and R. Murthy, Hive: a warehousing solution over a map-reduce framework, in Proc. VLDB Endow. 2009. p. 1626-1629.
25. T. White, Hadoop: The Definitive Guide, Second Edition. 2010, O'Reilly Media.
26. W. Xu, L. Huang, A. Fox, D. Patterson, and M. Jordan, Mining console logs for large-scale system problem detection, in Proceedings of the Third conference on Tackling computer systems problems with machine learning techniques. 2008, USENIX Association: San Diego, California. p. 4-4.
27. L. Yang and Z. Shi, An Efficient Data Mining Framework on Hadoop using Java Persistence API, in Computer and Information Technology (CIT), 2010 IEEE 10th International Conference on. 2010. p. 203-209.
28. M. Zaharia, A. Konwinski, A.D. Joseph, R. Katz, and I. Stoica, Improving MapReduce performance in heterogeneous environments, in Proceedings of the 8th USENIX conference on Operating systems design and implementation. 2008, USENIX Association: San Diego, California. p. 29-42.
29. Amazon Web Services. Available from: http://aws.amazon.com/.
30. Apache Avro. Available from: http://avro.apache.org/.
31. Apache Hadoop. Available from: http://hadoop.apache.org/
32. Apache Nutch. Available from: http://nutch.apache.org/.
33. Apache Pig. Available from: http://pig.apache.org/.
34. Apache ZooKeeper™. Available from: http://zookeeper.apache.org/.
35. CRM & Cloud Computing - salesforce.com. Available from: http://www.salesforce.com.
36. Facebook. Available from: http://www.facebook.com.
37. Google App Engine. Available from: http://code.google.com/intl/en/appengine/.
38. Gridmix3 Emulating Production Workload for Apache Hadoop. Available from: http://developer.yahoo.com/blogs/hadoop/posts/2010/04/gridmix3_emulating_production/.
39. Hadoop at Twitter. 2010; Available from: http://engineering.twitter.com/2010/04/hadoop-at-twitter.html.
40. Hadoop hdfs-default. Available from: http://hadoop.apache.org/common/docs/current/hdfs-default.html.
41. Hadoop mapred-default. Available from: http://hadoop.apache.org/common/docs/current/mapred-default.html.
42. Hadoop Taiwan User Group. Available from: http://www.hadoop.tw/.
43. Hadoop Wiki PoweredBy. Available from: http://wiki.apache.org/hadoop/PoweredBy
44. Hadoop™ Common. Available from: http://hadoop.apache.org/common/.
45. Hadoop™ Distributed File System. Available from: http://hadoop.apache.org/hdfs/.
46. Hadoop™ MapReduce. Available from: http://hadoop.apache.org/mapreduce/.
47. HBase. Available from: http://hbase.apache.org/.
48. Hive. Available from: http://hive.apache.org/.
49. HowManyMapsAndReduces - Hadoop Wiki. Available from: http://wiki.apache.org/hadoop/HowManyMapsAndReduces.
50. IBM. Available from: http://www.ibm.com/us/en/.
51. Microsoft Corporation. Available from: http://www.microsoft.com/en-us/default.aspx.
52. NCHC Cloud Computing Research Group. Available from: http://trac.nchc.org.tw/cloud.
53. Sort Benchmark Home Page. Available from: http://sortbenchmark.org/.
54. Sqoop. Available from: http://www.cloudera.com/downloads/sqoop/.
55. Wikipedia Hadoop. Available from: http://en.wikipedia.org/wiki/Hadoop.
56. Yahoo! Hadoop Tutorial: Managing a Hadoop Cluster. Available from: http://developer.yahoo.com/hadoop/tutorial/module7.html.
57. A. Anand. Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds. 2009; Available from: http://developer.yahoo.com/blogs/hadoop/posts/2009/05/hadoop_sorts_a_petabyte_in_162/.
58. D. Gottfrid. Self-Service, Prorated Supercomputing Fun! 2007; Available from: http://open.blogs.nytimes.com/2007/11/01/self-service-prorated-super-computing-fun/.
59. D. Tankel. Scalability of the Hadoop Distributed File System. 2010; Available from: http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/.
60. J. Zawodny. Yahoo! Launches World's Largest Hadoop Production Application. 2008; Available from: http://developer.yahoo.com/blogs/hadoop/posts/2008/02/yahoo-worlds-largest-production-hadoop/.

註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔