跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.82) 您好!臺灣時間:2025/01/23 04:50
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:盧本翔
研究生(外文):Ben-Shen Lou
論文名稱:應用R語言套件於空氣汙染之分散式時空資料分析
論文名稱(外文):An R Package for Distributed Spatial-Temporal Analysis of Air Pollution
指導教授:劉榮春劉榮春引用關係楊朝棟楊朝棟引用關係
指導教授(外文):Jung-Chun LiuChun-Yao Chung
口試委員:張玉山時文中姜自強劉榮春楊朝棟
口試日期:2017-06-30
學位類別:碩士
校院名稱:東海大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:65
中文關鍵詞:空汙RHadoopSparkRR套件距離反比權重法
外文關鍵詞:Air pollutionRHadoopSparkRR packageIDW
相關次數:
  • 被引用被引用:5
  • 點閱點閱:1064
  • 評分評分:
  • 下載下載:201
  • 收藏至我的研究室書目清單書目收藏:1
近年來,由於R 包數據分析和可視化的成熟軟件包,包括空氣污染分析,R 包成為大數據分析的流行工具。空氣污染問題日益受到全球關注,因為它對環境和人類健康有很大的影響。隨著物聯網的快速發展和傳感器收集的地理信息的準確性的提高,產生了大量的空氣污染數據。因此,由於存儲器設計的固有特性,難以有效可靠地分析單機環境中的空氣污染數據。在這項工作中,我們構建了基於RHadoop 和SparkR 軟件的分佈式計算環境,以更可靠,有效地進行空氣污染分析和可視化。在工作中,我們首先使用稱為EdiGreen AirBox 的傳感器來收集台中的空氣污染數據。然後,我們採用反向距離重心(IDW)方法將傳感器的數據轉換為密度圖。最後,實驗結果表明,利用ARIMA 模型對PM 2.5 短期預測結果的準確性進行了分析。另外,驗證關於MAPE 方法的預
測精度也在實驗結果中給出。
Recently, the R package becomes a popular tool for big data analysis due
to its several matured software packages for the data analysis and visualization,
including the analysis of air pollution. The air pollution problem is of increasing
global concern as it has greatly impacts on the environment and human health.
With the rapid development of IoT and the increasing of the accuracy of geographical
information collected by sensors, a huge amount of air pollution data
were generated. Thus, it is difficult to analyze the air pollution data in a single
machine environment effectively and reliably due to its inherent characteristic of
memory design. In this work, we construct a distributed computing environment
based on both the softwares of RHadoop and SparkR for performing the analysis
and visualization of air pollution with the R more reliably and effectively. In the work, we firstly use the sensors, called EdiGreen AirBox to collect the air pollution data in Taichung. Then, we adopt the Inverse DistanceWeighting (IDW) method to transform the sensors’data into the density map. Finally, the experimental results show the accuracy of the short-term prediction results of PM 2.5 by using the ARIMA model. In addition, the verification with respect to the prediction accuracy with the MAPE method is also presented in the experimental results.
摘要i
Abstract ii
致謝詞iii
Table of Contents iv
List of Figures vi
List of Tables vii
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 R and Big Data . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 R in the Advantages of Distributed Computing . . . . . . . 2
1.1.3 The Importance of Big Data in Air Pollution . . . . . . . . 3
1.2 Thesis Goal and Contributions . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background Review and Related Work 7
2.1 Background Review . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Rhadoop . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 SparkR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Visualizations R Using the Shiny Web . . . . . . . . . . . . 8
2.1.4 Inverse Distance Weighting Analysis (IDW) . . . . . . . . . 9
2.1.5 ARIMA Time Series . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Relation Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3 System Design and Implementation 16
3.1 System Design Architecture . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Rstudio and Shiny Sever . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 SparkR architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4 Data Transformation for IDW . . . . . . . . . . . . . . . . . . . . . 25
4 Experimental Results 28
iv
TABLE OF CONTENTS v
4.1 Experimental Environment . . . . . . . . . . . . . . . . . . . . . . . 29
4.2 Performance comparison of R for MySQL . . . . . . . . . . . . . . 29
4.3 Performance comparison of R for HDFS . . . . . . . . . . . . . . . 30
4.4 R language evaluates Data transformatoin performance of IDW . . 32
4.5 Verify the IDW accuracy with MAPE . . . . . . . . . . . . . . . . . 35
4.6 IDW repair ARIMA time series is broken . . . . . . . . . . . . . . . 42
5 Conclusions and Future Work 47
5.1 Concluding Remark . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 Future Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
References 49
Appendix 53
A Hadoop Installation 53
B Scala Installation 57
C Spark Installation 58
D HBase Installation 60
E R Installation 63
F RHadoop Installation 64
[1] D. eddelbuettel. cran task view: high-performance and parallel computing
with r, 2016. https://cran.r-project.org/web/views/
HighPerformanceComputing.html.
[2] Eric Nguyen. Data Mining Applications with R. Elsevier, 2014.
[3] M. Liang, C. Trejo, L. Muthu, L. B. Ngo, A. Luckow, and A. W. Apon.
Evaluating r-based big data analytic frameworks. In 2015 IEEE International
Conference on Cluster Computing, pages 508–509, Sept 2015.
[4] Manuel J. A. Eugster, Jochen Knaus, Christine Porzelius, Markus Schmidberger,
and Esmeralda Vicedo. Hands-on tutorial for parallel computing with
r. Computational Statistics, 26(2):219–239, Jun 2011.
[5] Karl Ropkins and David C. Carslaw. openair - data analysis tools for the air
quality community. R Journal, 4(1):20 – 29, 2012.
[6] Introduction to visualising spatial data in r, 2017. https://cran.
r-project.org/doc/contrib/intro-spatial-rl.pdf.
[7] Raissa Uskenbayeva, abu Kuandykov, Young Im Cho, Tolganay Temirbolatova,
Saule amanzholova, and Dinara Kozhamzharova. Integrating of data
using the hadoop and r. Procedia Computer Science, 56:145 – 149, 2015.
The 10th International Conference on Future Networks and Communications
(FNC 2015) / The 12th International Conference on Mobile Systems and
Pervasive Computing (MobiSPC 2015) Affiliated Workshops.
[8] Sparkr (r on spark), 2017. https://spark.apache.org/docs/1.6.0/
sparkr.html.
[9] Michael Armbrust, Reynold S. Xin, Cheng Lian, Yin Huai, Davies Liu,
Joseph K. Bradley, Xiangrui Meng, Tomer Kaftan, Michael J. Franklin, Ali
Ghodsi, and Matei Zaharia. Spark sql: Relational data processing in spark.
In Proceedings of the 2015 ACM SIGMOD International Conference on Management
of Data, SIGMOD ’15, pages 1383–1394, New York, NY, USA, 2015.
ACM.
[10] Gita Puspita Siknun and Imas Sukaesih Sitanggang. Web-based classification
application for forest fire data using the shiny framework and the c5.0
algorithm. Procedia Environmental Sciences, 33:332 – 339, 2016. The 2nd International
Symposium on LAPAN-IPB Satellite (LISAT) for Food Security
and Environmental Monitoring.
[11] Rachma Hermawati and Imas Sukaesih Sitanggang. Web-based clustering
application using shiny framework and dbscan algorithm for hotspots data in
peatland in sumatra. Procedia Environmental Sciences, 33:317 – 323, 2016.
The 2nd International Symposium on LAPAN-IPB Satellite (LISAT) for Food
Security and Environmental Monitoring.
[12] Matthew Wagner and Kenny Darrell. Exploring discrete database networks
of tricare health data using r and shiny. pages 635–658, 01 2014.
[13] Ludwig Ries. Areas of influence for idw-interpolation with isotropic environmental
data. CATENA, 20(1):199 – 205, 1993.
[14] Spatial interpolation via inverse path distance weighting, 2017. https://
cran.r-project.org/web/packages/ipdw/vignettes/ipdw2.html.
[15] Cristiane Silva da Silva, Juliana Marzari Rossato, Jocelita Aparecida Vaz
Rocha, and Vera Maria Ferrão Vargas. Characterization of an area of reference
for inhalable particulate matter (pm2.5) associated with genetic biomonitoring
in children. Mutation Research/Genetic Toxicology and Environmental
Mutagenesis, 778:44 – 55, 2015.
[16] Takashi Yorifuji, Saori Kashima, Midory Higa Diez, Yoko Kado, Satoshi
Sanada, and Hiroyuki Doi. Prenatal exposure to outdoor air pollution and
child behavioral problems at school age in japan. Environment International,
99:192 – 198, 2017.
[17] Ludwig Ries. Areas of influence for idw-interpolation with isotropic environmental
data. CATENA, 20(1):199 – 205, 1993.
[18] Ludwig Ries. Areas of influence for idw-interpolation with isotropic environmental
data. CATENA, 20(1):199 – 205, 1993.
[19] Bilgehan Ilker Harman, Hasan Koseoglu, and Cemal Ozer Yigit. Performance
evaluation of idw, kriging and multiquadric interpolation methods in producing
noise mapping: A case study at the city of isparta, turkey. Applied
Acoustics, 112:147 – 157, 2016.
[20] Carlos Zafra, Yenifer Ángel, and Eliana Torres. Arima analysis of the effect
of land surface coverage on pm10 concentrations in a high-altitude megacity.
Atmospheric Pollution Research, 8(4):660 – 668, 2017.
[21] Ping Wang, Hong Zhang, Zuodong Qin, and Guisheng Zhang. A novel hybridgarch
model based on arima and svm for pm2.5 concentrations forecasting.
Atmospheric Pollution Research, 8(5):850 – 860, 2017.
[22] Chaitra H. Nagaraja. Introduction to r. Handbook of Statistics, 32:1 – 48,
2014. Computational Statistics with R.
[23] A. Ian McLeod, Hao Yu, and Esam Mahdi. Time series analysis with r.
Handbook of Statistics, 30:661 – 712, 2012. Time Series Analysis: Methods
and Applications.
[24] Javier López de Lacalle. The r-computing language: Potential for asian
economists. Journal of Asian Economics, 17(6):1066 – 1081, 2006.
[25] Spatial interpolation of geographical data in r, 2010. http://www.geo.ut.
ee/aasa/LOOM02331/R_idw_interpolation.html.
References 52
[26] Cran task view: High-performance and parallel computing with r, 2017.
https://cran.r-project.org/web/views/HighPerformanceComputing.
html.
[27] Martin Sedlmayr, Tobias Würfl, Christian Maier, Lothar Häberle, Peter
Fasching, Hans-Ulrich Prokosch, and Jan Christoph. Optimizing r with sparkr
on a commodity cluster for biomedical research. Computer Methods and Programs
in Biomedicine, 137:321 – 328, 2016.
[28] Shivaram Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, Hossein
Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica,
and Matei Zaharia. Sparkr: Scaling r programs with spark. In Proceedings
of the 2016 International Conference on Management of Data, SIGMOD ’16,
pages 1099–1104, New York, NY, USA, 2016. ACM.
[29] S.J. LIN and C.T. HSU. Encryption and decryption methods applied on
operating system, June 14 2016. US Patent 9,367,690.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top