跳到主要內容

臺灣博碩士論文加值系統

(18.97.14.89) 您好!臺灣時間:2025/01/26 04:31
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:吳軒衡
研究生(外文):Hsuan-Heng Wu
論文名稱:群組虛擬機容錯系統實作與優化
論文名稱(外文):Implementation and Optimization of Group Virtual Machine Fault Tolerance
指導教授:徐慰中
指導教授(外文):Wei-Chung Hsu
口試日期:2017-07-31
學位類別:碩士
校院名稱:國立臺灣大學
系所名稱:資訊工程學研究所
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2017
畢業學年度:105
語文別:英文
論文頁數:42
中文關鍵詞:虛擬化容錯系統分散式系統
外文關鍵詞:VirtualizationFault-TolernaceDistributed System
相關次數:
  • 被引用被引用:0
  • 點閱點閱:192
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著雲端計算系統的興起,將單一服務拆解為多數互相溝通之微服務 以增進程式開發與維護效率成為趨勢。而這些服務多半透過如訊息傳 遞介面等程式軟體庫來進行微服務間的溝通。
現行透過基於快照之容錯系統透過輸出緩衝的方式,實現了無縫 容錯轉移的功能,即使在錯誤發生的情形下,應用程式使用者也不會 察覺到提供服務的伺服器已轉移。然而輸出緩衝的使用會降低網路的 傳輸的效率,使得以網路傳輸為主的應用在應用此容錯系統時,使效 能受到嚴重的影響。
在此論文中,我們提出群組虛擬機器容錯系統的概念,旨在透過 取消內部輸出緩衝的方式,增進分散式服務在容錯系統中的效能,並 佐以此方式對於效能影響的評估數據。原先快照與轉移的相關程序也 必須因應輸出緩衝的取消做出更改,以因應群組中虛擬機記憶體狀態 一致性的要求。此外,本論文中也提出了一種對於基於容錯系統啟動 與再啟動之協定產生的系統下線時間,透過避免群組中部分虛擬機之 記憶體轉移的方式,減少整個群組下線時間的方法。
With the rise of Cloud Computing, it is possible to break up a single service into multiple components that communicate with each other using message passing library such as MPI to achieve better software development and testing.
Existing checkpoint-based Fault-Tolerance systems make use of output-bu ering technique to realize seamless service failover, that is, to make sure that ap- plication end-users aren’t aware of service failover when hardware fault occurs. However, applications with large amount of inter process commu- nication experience uneglectable communication overhead due to the use
of output-bu ering.
In this thesis, we propose the concept of Group Virtual Machine Fault- Tolerance, that enables Fault-Tolerance protection for a distributed service without the need of bu ering intergroup communication. Modi cations to checkpoint and failover procedure had been made in order to maintain the consistency of memory state within group. The evaluation of such approach is given. An optimization regarding system downtime caused by the initial- ization and re-initialization of the Group Fault-Tolerance protocol is also introduced and evaluated in this work.
摘要 iii
Abstract v
1 Introduction 1
1.1 Fault Tolerance Solutions 1
1.2 Fault Tolerance in Cloud 2
2 Background 5
2.1 Traditional Fault-Tolerance Solutions 5
2.1.1 Hardware-based Fault-Tolerance 5
2.1.2 Fault-Tolerance in Application Software 6
2.2 Hypervisor-based Fault-Tolerance 7
2.3 Asynchronous Memory Replication-based Fault-tolerance 8
2.4 Kemari 8
2.5 Cuju-FT 9
2.5.1 TCP packet integration 9
2.5.2 Kernel Dirty Page Transfer 9
2.5.3 Light Weight Virtual Machine Execution State Transition 10
3 Design and Implementation 11
3.1 Group Virtual Machine Fault-Tolerance (GFT) 11
3.1.1 Rise of Concept 11
3.1.2 Definition of Virtual Machine Group 13
3.1.3 Correctness of Group Virtual Machine Fault-Tolerance 13
3.1.3.1 Synchronous Snapshot Requirement 13
3.1.3.2 Synchronous Failover Requirement 14
3.2 Implementation of Fault-Tolerance and Group Virtual Machine Fault-Tolerance 14
3.2.1 Basic Hardware Configuration 14
3.2.2 Group Establish Protocol 15
3.2.3 Event Handling in Qemu 16
3.2.4 Inter Group Communication 16
3.2.5 Group Checkpoint Synchronization 16
3.2.6 Failure Detection and Trigger of Failover 18
3.3 Rebuilding Group Fault Tolerance Protection 19
3.3.1 Implementation of GFT Web Stub 20
3.3.2 Procedure to Re-establish Group Fault-Tolerance Protocol 21
4 Downtime Optimization 23
4.1 System Downtime in Fault-Tolerance System 23
4.2 Motivation for Downtime Optimization 24
4.3 Implementation of Group Fault Tolerance Downtime Optimization 25
4.3.1 Roll-back Memory State 25
4.3.2 Theoretical Upper bound of Downtime Optimization 25
4.3.3 Modification to Existing Protocol 26
5 Evaluation 29
5.1 Experiment Setup 29
5.1.1 Hardware Setup 29
5.1.2 Software Setup 29
5.1.3 Virtual Machine Setup 30
5.2 Evaluation of Communication Overhead 30
5.2.1 Sysbench DBMS Testing 30
5.2.2 SCP Performance Evaluation 31
5.3 Evaluation of Group Fault Tolerance System Overhead 32
5.3.1 UnixBench 32
5.3.2 Kernel Compilation 34
5.4 Evaluation of Downtime Optimization 34
5.4.1 Pairwise Comparison Between Initial Time and System Downtime 34
5.4.2 Evaluation of Downtime Optimization with 4 members 35
6 Conclusion 39
7 Future Work 40
Bibliography 41
[1] 2012. Fault Tolerance & High Availability. https://media.amazonwebservices.com/ architecturecenter/AWS_ac_ra_ftha_04.pdf. (2012). Accessed: 2017-08-02.

[2] Joel Bartlett, Jim Gray, and Bob Horst. 1987. Fault Tolerance in Tandem Com- puter Systems. Springer Vienna, Vienna, 55–76. https://doi.org/10.1007/978-3- 7091-8871-2_3

[3] T. C. Bressoud and F. B. Schneider. 1995. Hypervisor-based Fault Tolerance. In
Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles (SOSP ’95). ACM, New York, NY, USA, 1–11. https://doi.org/10.1145/224056.224058

[4] M Burman. Aspects of a High-Volume Production Online Banking System. IEEE Spring Compcon, 244–248.

[5] Brendan Cully, Geo rey Lefebvre, Dutch Meyer, Mike Feeley, Norm Hutchinson, and Andrew War eld. 2008. Remus: High Availability via Asynchronous Virtual Machine Replication. In Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation (NSDI’08). USENIX Association, Berkeley, CA, USA, 161–174. http://dl.acm.org/citation.cfm?id=1387589.1387601

[6] Patrick Hunt, Mahadev Konar, Flavio P. Junqueira, and Benjamin Reed. 2010. ZooKeeper: Wait-free Coordination for Internet-scale Systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference (USENIXATC’10). USENIX Association, Berkeley, CA, USA, 11–11. http://dl.acm.org/citation.cfm?id =1855840.1855851

[7] C. Mohan, D. Haderle, B. Lindsay, H. Pirahesh, and P. Schwarz. 1992. ARIES: a transaction recovery method supporting ne-granularity locking and partial roll- backs using write-ahead logging. ACM Transactions on Database Systems (TODS) 17, 1 (1992), 94–162. http://scholar.google.com/scholar.bib?q=inf o:XCSRwcMyvR0J: scholar.google.com/&output=citation&hl=en&as_sdt=0,5&ct=citation&cd=0

[8] SamihaMouradandDorothyAndrews.1987.OntheReliabilityoftheIBMMVS/XA Operating System. IEEE Trans. Softw. Eng. 13, 10 (Oct. 1987), 1135–1139. https: //doi.org/10.1109/TSE.1987.232855

[9] Yoshi Tamura. 2008. Kemari: Virtual machine synchronization for fault tolerance using DomT. Xen Summit.
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top