跳到主要內容

臺灣博碩士論文加值系統

(44.201.99.222) 您好!臺灣時間:2022/12/10 10:07
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:林賢柏
研究生(外文):Xian-Bo Lin
論文名稱:一個以自動機為基礎的大數據哈杜普分類模型之研究
論文名稱(外文):An Automata Machine Based Hadoop Classifier For Big Data
指導教授:施東河施東河引用關係
指導教授(外文):Dong-Her Shih
口試委員:洪新原黃興進施東河
口試委員(外文):HONG, XIN YUANHwang, Hsin-ginnDong-Her Shih
口試日期:2017-07-04
學位類別:碩士
校院名稱:國立雲林科技大學
系所名稱:資訊管理系
學門:電算機學門
學類:電算機一般學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:中文
論文頁數:44
中文關鍵詞:大數據分類模型自動機哈杜普
外文關鍵詞:big dataclassificationAutomataHadoop
相關次數:
  • 被引用被引用:0
  • 點閱點閱:188
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:2
隨著資訊科技的進步,大數據的時代來臨,各個領域範疇的數據量已急速擴增,現今有許多技術與設備每天都在產生大量的數據,且因為智能設備的普及,技術不斷地提升,所收集到的數據量也就越來越多。這些巨量資料有著珍貴的訊息,但機器學習應用於大數據中是具有挑戰性,因為傳統的機器學習無法有效的處理如此龐大的數據量及即時分析。為了得到大數據價值的最大化,需要一種新的方法架構,因此本研究的目的是要建立一個基於自動機的分類模型並且在哈杜普的環境下執行,來解決數據量大的問題,透過自動機的分類模型,不但能夠解決數據量大與即時需求的問題,且能夠準確預測,不會因為數據量大或即時需求問題而困擾。實驗結果本研究的方法可以成功對資料集建立分類模型,且與各種分類器相比之後,本研究的準確度是偏高的。而本研究的準確度雖略低於傳統的支援向量機,但在執行時間上卻快了傳統的支援向量機千倍的速度。
With the progress of information technology, the era of big data, the field of data in various areas has rapidly expanded, and now there are many technologies and equipment every day to produce big data, And because the popularity of intelligent devices, technology continues to improve, the amount of data collected is more and more. These huge amounts of information have valuable information, but it is challenging for machine learning to be applied to large data, because traditional machine learning can not effectively deal with such a big data and immediate analysis. In order to maximize the value of large data, a new method architecture is needed. Therefore, the purpose of this study is to establish a classification model based on automata and execute it in the environment of Hadoop to solve the problem of big data. Automatic classification model, not only can solve the problem of big data and immediate demand, and can accurately predict. Not because of big data or immediate problems and problems. Experimental results the method of this study can successfully establish a classification model for data sets, and compared with a variety of classifiers, the accuracy of this study is high. The accuracy of this study is slightly lower than the traditional support vector machine, but in the execution time is faster than the traditional support vector machine thousands of times the speed.
摘要 i
Abstract ii
目錄 iii
表目錄 iv
圖目錄 v
1. 緒論 1
2. 文獻探討 3
2.1大數據 3
2.2 資料探勘 5
2.2.1 支援向量機 5
2.2.2 最近鄰居法 8
2.2.3 決策樹 9
2.3.4 貝氏分類器 10
2.3大數據分類模型 11
2.4 自動機 12
2.5 Hadoop 13
3.研究方法 17
3.1研究架構 17
3.2研究流程 18
3.3轉換方法 19
3.3.1 規則轉換自動機方法 19
3.3.2 MapReduce 22
4.實驗流程 24
4.1資料描述 24
4.1.1 新聞受歡迎程度資料集(Online News) 24
4.1.2 皮膚分割資料集(Skin Seg) 26
4.2效能指標 26
4.3實驗過程 27
5.實驗結果與討論 29
5.1 實驗結果 29
5.2 討論 30
6.結論 32
6.1研究限制 33
6.2未來方向 33
參考文獻 34
英文部分
[1]Agaoglu, M. (2016). Predicting Instructor Performance Using Data Mining Techniques in Higher Education. IEEE Access, 4, 2379-2387.
[2]Alharthi, A., Krotov, V., & Bowman, M. (2017). Addressing barriers to big data. Business Horizons, 60(3), 285-292.
[3]Alham, N. K., Li, M., Liu, Y., & Qi, M. (2013). A MapReduce-based distributed SVM ensemble for scalable image classification and annotation. Computers & Mathematics with Applications, 66(10), 1920-1934.
[4]Arthur, W. B. (2011). The second economy. McKinsey Quarterly, 4, 90-99.
[5]Bende, S., & Shedge, R. Dealing with Small Files Problem in Hadoop Distributed File System. Procedia Computer Science, 79, 1001-1012. (2016).
[6]Beyer, M. (2011). Gartner Says Solving'Big Data'Challenge Involves More Than Just Managing Volumes of Data. Gartner. Archived from the original on, 10.
[7]Burges, C. J. (1998). A tutorial on support vector machines for pattern recognition. Data mining and knowledge discovery, 2(2), 121-167.
[8]Chang, C. C., & Lin, C. J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27. (2011).
[9]Chen, M., Mao, S., Zhang, Y., & Leung, V. C. (2014). Big data: related technologies, challenges and future prospects (pp. 2-9). Heidelberg: Springer.
[10]Chen, M. S., Han, J., & Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and data Engineering, 8(6), 866-883.
[11]Chen, T., Zhang, X., Jin, S., & Kim, O. (2014). Efficient classification using parallel and scalable compressed model and its application on intrusion detection. Expert Systems with Applications, 41(13), 5972-5983.
[12]Dean, J., & Ghemawat, S. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1), 107-113. (2008).
[13]Eswari, T., Sampath, P., & Lavanya, S. (2015). Predictive methodology for diabetic data analysis in big data. Procedia Computer Science, 50, 203-208.
[14]Fernandes, K., Vinagre, P., & Cortez, P. (2015, September). A proactive intelligent decision support system for predicting the popularity of online news. In Portuguese Conference on Artificial Intelligence (pp. 535-546). Springer, Cham.
[15]Furht, B., Villanustre, F., 2016. Introduction to big data. In: Big Data Technol. App.. Springer International Publishing, Cham, pp. 3–11.
[16]Go, E., Jung, E. H., & Wu, M. The effects of source cues on online news perception. Computers in Human Behavior, 38, 358-367 (2014).
[17]Saveh, S. Finite Automata Algorithms in Map-Reduce (Doctoral dissertation, Concordia University). (2015).
[18]ScienceDaily(2017), Your source for the latest research news, (https://www.sciencedaily.com/) (last accessed:2017/06/07).
[19]Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. Elsevier.
[20]Hashem, I. A. T., Chang, V., Anuar, N. B., Adewole, K., Yaqoob, I., Gani, A., ... & Chiroma, H. (2016). The role of big data in smart city. International Journal of Information Management, 36(5), 748-758.
[21]Hu, Y., Hu, C., Fu, S., Shi, P., & Ning, B. Predicting the Popularity of Viral Topics Based on Time Series Forecasting. Neuro computing (2016).
[22]IBM, IBM Analytic Big data Employ the most effective big data technology (https://www.ibm.com/analytics/us/en/technology/big-data/) (last accessed:2017/06/07).
[23]John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman - Introduction to Automata Theory, Languages, and Computation (3nd Edition). (2006)
[24]Khade, A. A. (2016). Performing Customer Behavior Analysis using Big Data Analytics. Procedia Computer Science, 79, 986-992.
[25]Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z., Mahmoud Ali, W. K., Alam, M., ... & Gani, A. (2014). Big data: survey, technologies, opportunities, and challenges. The Scientific World Journal, 2014.
[26]Khan, S., Liu, X., Shakil, K. A., & Alam, M. (2017). A survey on scholarly data: From big data perspective. Information Processing & Management, 53(4), 923-944.
[27]Kumar, M., Rath, N. K., & Rath, S. K. (2016). Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier. Journal of biomedical informatics, 60, 395-409.
[28]Maillo, J., Ramírez, S., Triguero, I., & Herrera, F. (2017). kNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data. Knowledge-Based Systems, 117, 3-15.
[29]Mahesh, P., & Mather, P. M. (2003). Support vector classifiers for land cover classification. In Proceedings of the 6th Annual International Conference, Map India 2003, New Delhi, India.
[30]Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity.
[31]McAfee, A., & Brynjolfsson, E. (2012). Big data: the management revolution. Harvard business review, 90(10), 60-68.
[32]M. Minelli , M. Chambers , A. Dhiraj , Big Data, Big Analytics: Emerging Business Intelligence and Analytic Trends for Today’s Businesses (Wiley CIO), 1st edition, Wiley Publishing, 2013 .
[33]Nair, L. R., Shetty, S. D., & Shetty, S. D. (2017). Applying spark based machine learning model on streaming big data for health status prediction. Computers & Electrical Engineering.
[34]Nambiar, R., Bhardwaj, R., Sethi, A., & Vargheese, R. (2013, October). A look at challenges and opportunities of big data analytics in healthcare. In Big Data, 2013 IEEE International Conference on (pp. 17-22). IEEE.
[35]Oussous, A., Benjelloun, F. Z., Lahcen, A. A., & Belfkih, S. (2017). Big Data Technologies: A Survey. Journal of King Saud University-Computer and Information Sciences.
[36]Rahman, M. N., Esmailpour, A., & Zhao, J. (2016). Machine learning with big data an efficient electricity generation forecasting system. Big Data Research, 5, 9-15.
[37]Schölkopf, B., & Burges, C. J. (1999). Advances in kernel methods: support vector learning. MIT press.
[38]Stimmel, C. L. (2014). Big data analytics strategies for the smart grid. CRC Press.
[39]Sun, K., Kang, H., & Park, H. H. (2015). Tagging and classifying facial images in cloud environments based on KNN using MapReduce. Optik-International Journal for Light and Electron Optics, 126(21), 3227-3233.
[40]Tatar, A., Leguay, J., Antoniadis, P., Limbourg, A., de Amorim, M. D., & Fdida, S. Predicting the popularity of online articles based on user comments. In Proceedings of the International Conference on Web Intelligence, Mining and Semantics (p. 67). ACM. (2011).
[41]Tsai, C. F., Lin, W. C., & Ke, S. W. Big data mining with parallel computing: A comparison of distributed and MapReduce methodologies. Journal of Systems and Software, 122, 83-92. (2016).
[42]Wu, B., & Shen, H. Analyzing and predicting news popularity on Twitter. International Journal of Information Management, 35(6), 702-711. (2015).
[43]Wang, M., Ni, B., Hua, X. S., & Chua, T. S. Assistive tagging: A survey of multimedia tagging with human-computer joint exploration. ACM Computing Surveys (CSUR), 44(4), 25. (2012).
[44]Vapnik. V. (1998). Statistical learning theory, Wiley Inc.
[45]Vapnik. V. and Chapelle, O. (1999). Bounds on error expectation for SVM, MIT Press.
[46]Xia, D., Wang, B., Li, H., Li, Y., & Zhang, Z. (2016). A distributed spatial–temporal weighted model on MapReduce for short-term traffic flow forecasting. Neurocomputing, 179, 246-263.
[47]Yildiz, O., Ibrahim, S., & Antoniu, G. (2016). Enabling fast failure recovery in shared Hadoop clusters: Towards failure-aware scheduling. Future Generation Computer Systems.

[48]Apache Hadoop, http://hadoop.apache.org, May 2017(last accessed:2017/06/07).

中文部分
[49]石子, d. (2016). 《重新想像你的世界》: 什麼是「互聯網+」?融合雲端、大數據、互聯網、物聯網的現在進行式. (https://www.dcplus.com.tw/marketing-knowledge/growther/71679) (last accessed:2017/06/07).
[50]黃心怡(2016),實現雲端運算 Hadoop MapReduce 之分級服務,碩士論文
[51]曾彥志(2016),使用Hadoop技術建立巨量資料分析處理模型:以空氣汙染資料為例,碩士論文
[52]陳惠琪(2006),螞蟻分類技術之研究,國立交通大學碩士論文
[53]謝明瑞, 國. (2015). 大數據分析(2)- 大數據與破壞性創新. (http://www.npf.org.tw/2/14848) (last accessed:2017/06/07).
[54]管理知識中心(2015),巨量資料 (Big data)產業應用成功的關鍵 (http://mymkc.com/article/content/22243) (last accessed:2017/06/07).
[55]鄭允中. (2015). 大數據與數據可視化Big data and data visualization. (http://mropengate.blogspot.tw/2015/03/big-datadata-visualization.html) (last accessed:2017/06/07).
[56]秦秉達(2015),基於Hadoop MapReduce叢集設計平行化二元分類演算法,國立臺中科技大學碩士論文

QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊