跳到主要內容

臺灣博碩士論文加值系統

(44.222.64.76) 您好!臺灣時間:2024/06/17 10:24
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:沈思鎧
研究生(外文):Shen, Sih-Kai
論文名稱:設計與實作支援嵌入式系統之軟體的系統 層級重複技術
論文名稱(外文):Design and Implementation of Software System-Level Redundancy for Embedded Systems
指導教授:陳鵬升陳鵬升引用關係
指導教授(外文):Peng-Sheng Chen
口試委員:張榮貴涂嘉恆
口試委員(外文):CHANG, RONG-GUEYTU, CHIA-HENG
口試日期:2019-07-05
學位類別:碩士
校院名稱:國立中正大學
系所名稱:資訊工程研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2019
畢業學年度:107
語文別:英文
論文頁數:44
中文關鍵詞:錯誤容忍可靠性系統冗餘
外文關鍵詞:System-level redundancyReliabilityFault tolerance
相關次數:
  • 被引用被引用:0
  • 點閱點閱:143
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
在我們的生活中有越來越多的computer systems (例如: 智慧型手機,智慧居家裝置,醫療系統,工業控制系統,無人機和機器人等等)。其中一些如(自動駕駛和機器人) 需要高度的reliability 和stability 以維持他們正常運作. 他們如果故障可能會導致嚴重的後果甚至造成生命的危害。因此,開發提升系統可靠性的技術是一個重要的議題。
System-level redundancy 是利用多個冗餘系統做錯誤容忍,此論文主要設計和實作在嵌入式系統中,整個架構是由一個主系統和一個冗餘系統以network socket 連接,再以心跳機制來確認主系統是否正常。GlusterFS 為一個open source 的分散式檔案系統,集合disk 儲存資源來提供可靠的儲存環境。此外,checkpointing tool(DMTCP) 儲存應用程式的執行狀態以便從故障中恢復。實驗結果顯示benchmark 和測試情況,所提出的方法可以提高容錯能力,並允許在故障後恢復運作。
More and more computer systems begin to permeate our everyday lives (e.g., smart phones, smart home devices, medical systems, industrial control systems, drone, and robots). Some of them, like self-driving vehicles and robots, require high reliability and stability to maintain their normal operation. Their failures can lead to serious and even life-threatening problems.
Therefore, developing the techniques to enhance system reliability is an important issue.
System-level redundancy involves using multiple redundant systems to tolerate failures. It is designed and implemented here for embedded systems. The whole structure consists of a primary system connected to a redundant system using network socket programming APIs. A heartbeat mechanism checks whether the primary system is alive. GlusterFS, an open source distributed file system, aggregates the disk storage resources to provide dependable storage. In addition, distributed multithreaded checkpointing (DMTCP) stores the execution states of the application to allow resumption from failure. Experimental results for benchmarking and test situations show that the proposed approach can improve fault tolerance and allow operation to resume after failure.
1 Introduction 1
1.1 Related works . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Checkpointing tool DMTCP . . . . . . . . . . . . . . . . . . . 5
2.2 Distributed file system GlusterFS . . . . . . . . . . . . . . . . 6
3 Design and Implementation 8
3.1 Communication mechanism: . . . . . . . . . . . . . . . . . . . 10
3.1.1 Primary system. . . . . . . . . . . . . . . . . . . . . . 12
3.1.2 Redundant system. . . . . . . . . . . . . . . . . . . . . 14
3.2 Checkpointing: . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Dependable storage . . . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Assistant tool . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4 Experiment 24
4.1 Environment configuration . . . . . . . . . . . . . . . . . . . . 25
4.2 Experiment results . . . . . . . . . . . . . . . . . . . . . . . . 26
4.2.1 The actual operation screenshots . . . . . . . . . . . . 26
4.2.2 Experiment data . . . . . . . . . . . . . . . . . . . . . 34
5 Conclusion 41
[1] Baumann, “Soft errors in advanced computer systems,” IEEE Des. Test,
vol. 22, no. 3, pp. 258–266, 2005.
[2] R. C. Baumann, “Radiation-induced soft errors in advanced semiconductor
technologies,” in IEEE Transactions on Device and Materials
Reliability, vol. 5, no. 3, pp. 305 – 316, Sep. 2005.
[3] G. A. Reis, J. Chang, N. Vachharajani, R. Rangan, and D. I. Au-gust,
“Swift: Software implemented fault tolerance,” in Proceedings of the
International Symposium on Code Genera-tion and Optimization (CGO’
05), Washington, DC, USA: IEEE Computer Society, pp. 243 – 254,
2005.
[4] A. Shye, J. Blomstedt, T. Moseley, V. J. Reddi, and D. A. Connors,
“Plr: A software approach to transient fault tolerance for multi-core
architectures,” in IEEE Transactions on Dependable and Secure Computing,
vol. 6, no. 2, p. 135–148, Apr. 2009.
[5] C. Wang, H.-S. Kim, Y. Wu, and V. Ying, “Compiler-managed softwarebased
redundant multi-threading for transient fault detection,” in International
Symposium on Code Generation and Optimization (CGO’07),
pp. 244 – 258, Mar. 2007.
[6] S. K. Reinhardt and S. S. Mukherjee., “Transient fault detection via
simultaneous multithreading,” in International symposium on Computer
architecture), p. 25–36, 2000.
[7] A. Avizienis, “The n-version approach to fault-tolerant software,” in
IEEE Transactions on Software Engineering, pp. 1491–1501, Dec. 1985.
[8] “Raspberrypi official,” Website. Online available at https://www.
raspberrypi.org.
[9] A. Davies and A. Orsaria, “Scale out with glusterfs,” Linux Journal,
vol. 2013, no. 235, 2013.
[10] “Glusterfs documentation,” Website. Online available at https://docs.
gluster.org.
[11] J. Ansel, K. Aryay, and G. Cooperman, “Dmtcp: Transparent checkpointing
for cluster computations and the desktop,” in IEEE International
Symposium on Parallel & Distributed Processing, pp. 1–12, May.
2009.
[12] D. J. Scales, M. Nelson, and G. Venkitachalam, “The design of a practical
system for fault-tolerant virtual machines,” in ACM SIGOPS Oper.
Syst. Rev., pp. 30–39, Dec. 2010.
[13] “Leosys,” Website, 2015. Online available at http://www.
leosys.com/index.php/aboutleo/newsletter/itdevelopment/
10-2015-02-25-02-29-10.
[14] W.-C. Liao, “Implementing a virtual machine based fault tolerance system,”
Master’s thesis, National Taiwan University, 2017.
[15] J.-W. Shang, “Using non-deterministic event log and replay to support
virtual machine fault tolerance of kernel-based virtual machine,” Master’s
thesis, National Central University, 2016.
[16] J. S. Plank, M. Beck, G. Kingsley, and K. Li in Libckpt: transparent
checkpointing under Unix, pp. 18–18, Jan, 1995.
[17] M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny in Checkpoint
and migration of UNIX processes in the Condor distributed pro-cessing
system, 1997.
[18] P. Hargrove and J. Duell, “Berkeley lab checkpoint/restart (blcr) for
linux clusters,” Journal of Physics Conference Series, vol. 46, pp. 494–
499, 2006.
[19] I. P. Egwutuoha, D. Levy, B. Selic, and S. Chen, “A survey of fault
tolerance mechanisms and checkpoint/restart implementations for high
performance computing systems,” J. Supercomputing, vol. 65, no. 3,
pp. 1302–1326, 2013.
[20] “Make your projects more reliable,” John Shovic, 2015. Online
available at http://www.raspberry-pi-geek.com/Ar-chive/2015/
09/Making-your-projects-more-reliable.
[21] M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and
R. B. Brown, “Mibench: A free, commercially representative embedded
benchmark suite,” in Proceedings of the Workload Characterization, pp. 3
– 14, Dec. 2001.
[22] V. z̆ivojnović, J. M. Velarde, C. Schläger, and H. Meyr, “DSPSTONE: A
DSP-oriented benchmarking methodology,” in Proceedings of the International
Conference on Signal Processing and Technology (ICSPAT’94),
1994.


電子全文 電子全文(網際網路公開日期:20240814)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊