跳到主要內容

臺灣博碩士論文加值系統

(34.204.169.230) 您好!臺灣時間:2024/02/28 07:42
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:鄒易達
研究生(外文):I-ta Tzou
論文名稱:容錯及錯誤回復叢集系統在科學計算之實作
論文名稱(外文):Implementation of a Fault Tolerant Cluster with Error Recovery for Scientific Computation
指導教授:郭斯彥郭斯彥引用關係
指導教授(外文):Sy-yen Kuo
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:電機工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2007
畢業學年度:95
語文別:英文
論文頁數:46
中文關鍵詞:容錯機制叢集系統檢查點訊息記錄獨立磁碟備援陣列網路檔案伺服器
外文關鍵詞:Fault Tolerantclustercheckpointmessage logredundant array of inexpensive disksnetwork file server
相關次數:
  • 被引用被引用:0
  • 點閱點閱:189
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
Recently, parallel computing is one of the main techniques to enhance computer performance. High performance computer can be applied to different fields, including commerce, national defense, and science. Numerical simulation is an important method that flourished science today. The simulation will fail if there is a intrusion during the simulation, so fault tolerance is an important issue.

There are two main categories of fault tolerant techniques, 1) Automatic, and 2)Non-Automatic. Basic automatic fault tolerant techniques applied on clusters will be discussed, which includes coordinated, uncoordinated checkpoints and pessimistic, optimistic message logging.

An automatic fault tolerant cluster under a scientific computational environment will be implemented with coordinated checkpoint. A storage backup strategy will also be implemented with a redundant array of inexpensive disks level five network file server.
口試委員會審定書 I
誌謝 II
中文摘要 III
ABSTRACT IV
LIST OF FIGURES 3
LIST OF TABLES 4

CHAPTER 1 INTRODUCTION 5

CHAPTER 2 BACKGROUND 7
2.1 PARALLEL HARDWARE ARCHITECTURES 7
2.1.1 Interconnection Networks 8
2.1.2 SIMD Systems 9
2.1.3 Shared Memory MIMD 10
2.1.4 Distributed-Memory MIMD 10
2.1.5 Strengths and Weakness 11
2.2 PARALLEL SOFTWARE ARCHITECTURE 11
2.2.1 Message Passing Interface 12
2.2.2 MPICH 13

CHAPTER 3 FAULT TOLERANT ARCHITECTURES 15
3.1 BASIC FAULT TOLERANT METHODS 15
3.1.1 Checkpointing 16
3.1.2 Message Logging 19
3.2 FAULT TOLERANT MPI 21
3.2.1 MPICH – V1 21
3.2.2 MPICH – V2 25
3.2.3 MPICH – CL 29
3.3 FAULT TOLERANT STORAGE 32

CHAPTER 4 IMPLEMENTATION 36
4.1 HARDWARE ARCHITECTURE 36
4.2 SOFTWARE COMPONENTS 37
4.3 BACKUP STRATEGY 38
4.4 EVALUATION 38

CHAPTER 5 CONCLUSION 42

REFERENCE 44
[1]S. Hariri, M. Parashar, “Tools and Environments for Parallel and Distributed Computing”, Wiley, 2004.

[2]I. Campbell, “Reliable Linux: assuring high availability”, John Wiley & Suns, New York, 2002.

[3]Evan Marcus, Hal Stern, “Blueprints for High Availability”, Wiley & Sons, Canada, 2000.

[4]Jie Wu, “Distributed Systems Design”, CRC Press, Florida, 1999.

[5]Rajkumar Buyya, “High Performance Cluster Computing”, Prentice Hall, New Jersey, 1999.

[6]B. Wilinson, M. Allen, “Parallel Programming – Techniques and applications using networked workstations and parallel computers”, Prentice Hall, New Jersey, 1999.

[7]Peter S. Pacheco, “Parallel Programming with MPI”, Morgan Kaufmann Publishers, San Fransisco, 1997.

[8]Message Passing Interface Forum, “MPI: A Message-Passing Interface Standard”, 1995.

[9]H. Stephen Morse, “Practical Parallel Computing”, Academic Press, United Kingdom, 1994.

[10]Aminian M, Akbari M.K., Javadi B., “Coordinated checkpoint from message payload in pessimistic sender-based message logging”, Parallel and Distributed Processing Symposium, 2006.

[11]A. Bouteiller, T. Herault, G. Krawezik, P. Lemarinier, F. Cappello, “MPICH-V: a Multiprotocol Fault Tolerant MPI”, International Journal of High Performance Computing and Applications, 2005.

[12]Janakiraman G.J., Santos J.R., Subhraveti D., Turner Y., “Cruz: Application-Transparent Distributed Checkpoint-Restart on Standard Operating Systems”, Dependable Systems and Networks, 2005.

[13]A. Bouteiller, P. Lemarinier, T. Hérault, G. Krawezik, F. Cappello, “Improved Message Logging versus Improved Coordinated Checkpointing for Fault Tolerant MPI”, In proceedings of The 2004 IEEE International Conference on Cluster Computing, San Diego USA, September 2004.

[14]A. Bouteiller, P. Lemarinier, G. Krawezik, F. Cappello “Coordinated checkpoint versus message log for fault tolerant MPI”, In proceedings of The 2003 IEEE International Conference on Cluster Computing, Honk Hong China, December 2003.

[15]A. Bouteiller, F. Cappello, T. Hérault, G. Krawezik, P. Lemarinier, F. Magniette "MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on the Pessimistic Sender Based Message Logging”, In proceedings of IEEE/ACM Conference on Supercomputing, Phoenix USA, November 2003.

[16]G. Bosilca, A. Bouteiller, F. Cappello, S. Djilali, G. Fédak, C. Germain, T. Hérault, P. Lemarinier, O. Lodygensky, F. Magniette, V. Néri, A. Selikhov, “MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes”, In proceedings of IEEE/ACM Conference on Supercomputing, Baltimore USA, November 2002.

[17]Yuqun Chen, Kai Li, Plank J.S., “CLIP: A Checkpointing Tool for Message Passing Parallel Programs”, In proceedings of IEEE/ACM Conference on Supercomputing, November 1997.

[18]K. M. Chandy and L.Lamport, “Distributed snapshots : Determining global states of distributed systems,” in Transactions on Computer Systems, vol. 3, ACM, Feburary, 1985, pp. 63-75

[19]Website: http://www-unix.mcs.anl.gov/mpi, Message Passing Interface, 10 January 2007.

[20]Website: http://www.lri.fr/~bouteill/MPICH-V/, MPICH-V Introduction, 10 January 2007.

[21]Website: http://lcic.org/documentation.html, Computational Cluster Documentation, 10 January 2007.

[22]Website: http://www.gridmpi.org/related.jsp, Grid MPI, 10 January 2007..

[23]Website: http://www.cs.wisc.edu/condor/, Condor Project Homepage, 10 January 2007.

[24]Website: http://www.netlib.org/benchmark/hpl/, HPL – A Portable Implementation of the High Performance Linpack Benchmark for Distributed Memory, 10 January 2007.

[25]Website: http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml, Checkpoint Restart, 10 January 2007.

[26]Website: http://www.top500.org, Top500 Supercomputing Sites, 10 January 2007.

[27]Website: http://www.gnu.org/software/gengetopt/gengetopt.html#Installation, GNU Gengetopt 2.19, 10 January 2007.

[28]Website: http://linux.vbird.org/, 鳥哥的 Linux 私房菜, 10 January 2007.

[29]Website: http://www.rocksclusters.org/wordpress/, Rocks Clusters, 10 January 2007.

[30]Website: http://www.iozone.org/, Iozone Filesystem Benchmark, 10 January 2007.

[31]Website: http://dast.nlanr.net/Projects/Iperf/ NLANR/DAST: Iperf 1.7.0 – The TCP/IP Bandwidth Measurement Tool, 10 January 2007.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top