(3.230.154.160) 您好!臺灣時間:2021/05/08 00:43
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果

詳目顯示:::

我願授權國圖
: 
twitterline
研究生:王建文
研究生(外文):Jiann-Wen Wang
論文名稱:基於libvirt與QEMU-KVM虛擬機器之記憶體層級同步容錯系統
論文名稱(外文):An Adaptive Continuous Checkpointing Fault-Tolerant Virtual Machine System based on QEMU-KVM with libvirt
指導教授:梁德容梁德容引用關係王尉任王尉任引用關係
指導教授(外文):Deron LiangWei-Jen Wang
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文出版年:2020
畢業學年度:108
語文別:英文
論文頁數:57
中文關鍵詞:QEMU-KVMLibvirt虛擬機器容錯系統持續同步
外文關鍵詞:QEMU-KVMLibvirtVirtual MachineFault ToleranceContinuous Checkpointing
相關次數:
  • 被引用被引用:0
  • 點閱點閱:44
  • 評分評分:系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔系統版面圖檔
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
隨著雲端計算與虛擬化技術的快速發展,資訊產業得以利用相關技術提升實體機器的利用效率並達成彈性的資源分配;然而在將多個伺服器整合到同一實體機器之時,也產生單一主機硬體故障即會導致多個服務失效的問題。基於虛擬化技術的容錯系統可以在主機硬體發生故障時,保護關鍵服務之虛擬機器運作狀態與其執行的 soft real-time 程式,進一步提升服務的可用性。
本研究基於 QEMU 3.0.0 、 libvirt 5.7.0 與持續同步的架構實作可透過外部管理介面控制的容錯系統,其中的持續同步架構藉由不斷同步主要虛擬機器與備援虛擬機器的狀態、並保證對外輸出的一致性,以達到容錯系統之基本要求。同時本研究以引入壓縮工具降低同步所需之頻寬、感知虛擬機器工作負載並進行參數設定等方式,協助系統管理者提升服務於容錯系統運作之效能。
The IT industries have commonly adopted the concept of cloud computing and virtualization, making resource management more efficient and elastic. However, as more servers are consolidated into one physical server, availability will be threatened by a single physical host's hardware failure. A virtualization-based fault-tolerant system can protect mission-critical virtual machines running soft real-time applications from such hardware failures, thus improving the services' availability.
Based on QEMU 3.0.0, libvirt 5.7.0, and continuous checkpointing, this study implements a virtualization-based fault-tolerant system with a management interface. Continuous checkpointing keeps replicating internal states of VM on the primary host to backup host to meet the requirements of fault tolerance, and outputs are buffered to ensure consistency. This study also designed and implemented two methods to reduce the performance degradation of guest applications brought by the system; by adjusting the checkpointing parameter automatically and utilizing compression tools to speed up dirty pages transfer on demand, system administrators can set up the system without finding out suitable parameter for every application and have more flexibility to deploy the system.
摘要..............................................................................................................................................i
Abstract.......................................................................................................................................ii
Contents.....................................................................................................................................iii
List of Figures............................................................................................................................vi
List of Tables...........................................................................................................................viii
I. Introduction..............................................................................................................................1
1.1 Research Background.......................................................................................................1
1.2 Motivation and Contributions..........................................................................................3
1.3 Outline..............................................................................................................................4
II. Background Knowledge.........................................................................................................5
2.1 QEMU and Kernel-based Virtual Machine......................................................................5
2.2 Libvirt...............................................................................................................................5
2.3 Types of VM Fault Tolerance Systems............................................................................6
2.3.1 Lock-Stepping...........................................................................................................6
2.3.2 Continuous Checkpointing.......................................................................................7
2.3.3 Hybrid.......................................................................................................................7
2.4 Live Migration with Compression Techniques................................................................8
III. System Design.......................................................................................................................9
3.1 Overall Architecture.........................................................................................................9
3.1.1 Checkpointing and Messaging................................................................................10
3.1.2 Watchdog................................................................................................................10
3.1.3 Export.....................................................................................................................10
3.1.4 Autopilot.................................................................................................................11
3.2 System Initialization.......................................................................................................11
3.3 Checkpointing Process and Network Output Correctness.............................................12
3.4 Fault Model and Fault Handling....................................................................................13
3.4.1 Fault Model Overview............................................................................................13
3.4.2 Correctness.............................................................................................................14
3.5 Libvirt Integration..........................................................................................................17
3.6 Additional Modification to QEMU................................................................................19
IV. Performance Improvements................................................................................................20
4.1 Experiment Environment...............................................................................................20
4.1.1 Environment Overview and Configuration............................................................20
4.1.2 Applications for Performance Evaluation..............................................................21
4.2 Adjusting Epoch Time Adaptively.................................................................................23
4.2.1 Finding Optimal Epoch Time with Manual Experiments.......................................23
4.2.2 Probing the Moving Average Online......................................................................26
4.3 Utilizing Compression Techniques................................................................................28
4.3.1 Implementation of Compressing Checkpoints........................................................28
4.3.2 Performance Evaluation on Compressing Checkpoints.........................................30
V. Evaluation.............................................................................................................................32
5.1 Experiment Environment...............................................................................................32
5.2 Experiment Results.........................................................................................................33
5.2.1 TPC-C OLTP Database Benchmark.......................................................................33
5.2.2 Acme Air in NodeJS...............................................................................................34
5.2.3 Kernel Compilation................................................................................................35
5.2.4 Network Latency of Idle Guest...............................................................................36
5.2.5 Network Throughput..............................................................................................37
VI. Related Work......................................................................................................................38
6.1 Virtual Machine Fault Tolerance...................................................................................38
6.1.1 Continuous Checkpointing Implementations.........................................................38
6.2 Live Migration with Lossless Compression Algorithms................................................40
6.2.1 XOR-Based Zero Run Length Encoding (XBZRLE).............................................40
6.2.2 LZ4 Lossless Compression.....................................................................................40
VII. Conclusion and Future Work.............................................................................................41
References.................................................................................................................................42
[1] M. Armbrust et al., “A View of Cloud Computing,” Commun ACM, vol. 53, pp. 50–58,
Apr. 2010, doi: 10.1145/1721654.1721672.
[2] Armbrust et al., “Above the Clouds: A Berkeley View of Cloud Computing,” Jan.
2009.
[3] R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, and I. Brandic, “Cloud Computing and
Emerging IT Platforms: Vision, Hype, and Reality for Delivering Computing as the 5th
Utility,” Future Gener. Comput. Syst., vol. 25, pp. 599–616, Jun. 2009, doi:
10.1016/j.future.2008.12.001.
[4] McAfee, LLC, “Cloud Market Share Report | AWS vs Azure vs Google Cloud 2019 |
McAfee,” Oct. 25, 2019. https://www.skyhighnetworks.com/cloud-security-blog/microsoft-
azure-closes-iaas-adoption-gap-with-amazon-aws/ (accessed Jul. 10, 2020).
[5] VMware, Inc, “What is vSphere 7? | Server Virtualization Software | VMware.” https://
www.vmware.com/products/vsphere.html (accessed Jul. 10, 2020).
[6] O. Sefraoui, M. Aissaoui, and M. Eleuldj, “OpenStack: Toward an Open-Source
Solution for Cloud Computing,” Int. J. Comput. Appl., vol. 55, pp. 38–42, Oct. 2012, doi:
10.5120/8738-2991.
[7] F. Bellard, “QEMU, a fast and portable dynamic translator,” in Proceedings of the
annual conference on USENIX Annual Technical Conference, Anaheim, CA, Apr. 2005, p.
41, Accessed: Jul. 10, 2020. [Online].
[8] A. Qumranet, Y. Qumranet, D. Qumranet, U. Qumranet, and A. Liguori, “KVM: The
Linux virtual machine monitor,” Proc. Linux Symp., vol. 15, Jan. 2007.
[9] “libvirt: The virtualization API.” https://libvirt.org/ (accessed Jul. 10, 2020).
[10] C. Clark et al., “Live Migration of Virtual Machines.,” May 2005.
[11] W. Voorsluys, J. Broberg, S. Venugopal, and R. Buyya, “Cost of Virtual Machine Live
Migration in Clouds: A Performance Evaluation,” Sep. 2011, vol. 5931, doi: 10.1007/978-
3-642-10665-1_23.
[12] K. Vishwanath and N. Nagappan, “Characterizing Cloud Computing Hardware
Reliability,” Jan. 2010, pp. 193–204, doi: 10.1145/1807128.1807161.
[13] J. Gray and D. Siewiorek, “High-Availability Computer Systems,” Computer, vol. 24,
pp. 39–48, Oct. 1991, doi: 10.1109/2.84898.
[14] D. Scales, M. Nelson, and G. Venkitachalam, “The design of a practical system for
fault-tolerant virtual machines,” Oper. Syst. Rev., vol. 44, pp. 30–39, Dec. 2010, doi:
10.1145/1899928.1899932.
[15] P.-J. Tsao, Y.-F. Sun, L.-H. Chen, and C.-Y. Cho, “Efficient Virtualization-Based
Fault Tolerance,” Dec. 2016, pp. 114–119, doi: 10.1109/ICS.2016.0031.
[16] C. Wang et al., “PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-
tolerance,” Apr. 2018.
[17] Y. Dong et al., “COLO: COarse-grained LOck-stepping virtual machines for non-stop
service,” presented at the Proceedings of the 4th Annual Symposium on Cloud Computing,
SoCC 2013, Oct. 2013, doi: 10.1145/2523616.2523630.
[18] A. Souza, A. Papadopoulos, L. Tomás, D. Gilbert, and J. Tordsson, “Hybrid Adaptive
Checkpointing for Virtual Machine Fault Tolerance,” Apr. 2018, pp. 12–22, doi:
10.1109/IC2E.2018.00023.
[19] M. Pereira da Silva, R. Obelheiro, and G. Koslovski, “Adaptive Remus : adaptive
checkpointing for Xen-based virtual machine replication,” Int. J. Parallel Emergent Distrib.
Syst., vol. 32, pp. 1–20, Mar. 2016, doi: 10.1080/17445760.2016.1162302.
[20] “qemu git repository: docs/COLO-FT.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[21] R. Russell, “virtio: towards a de-facto standard for virtual I/O devices.,” Oper. Syst.
Rev., vol. 42, pp. 95–103, Jan. 2008.
[22] Red Hat,Inc., “Introduction to virtio-networking and vhost-net.”
https://www.redhat.com/en/blog/introduction-virtio-networking-and-vhost-net (accessed
Jul. 10, 2020).
[23] Advanced Micro Devices Inc., “AMD64 Architecture Programmer’s Manual, Volume
2: System Programming; Chapter 15: Secure Virtual Machine,” p. 714, 2020.
[24] Intel Corporation, “Intel® 64 and IA-32 Architectures Software Developer’s Manual,
Volume 3C: System Programming Guide, Part 3; Part 3: CHAPTER 23, INTRODUCTION
TO VIRTUAL MACHINE EXTENSIONS,” p. 730.
[25] “libvirt: Applications using libvirt.” https://libvirt.org/apps.html (accessed Jul. 10,
2020).
[26] “Documentation/QMP - QEMU.” https://wiki.qemu.org/Documentation/QMP
(accessed Jul. 10, 2020).
[27] T. Bressoud and F. Schneider, “Hypervisor-Based Fault Tolerance.,” ACM Trans
Comput Syst, vol. 14, pp. 80–107, Feb. 1996, doi: 10.1145/224056.224058.
[28] B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, “Remus:
High Availability via Asynchronous Virtual Machine Replication,” Apr. 2008.
[29] “Features/MicroCheckpointing - QEMU.”
https://wiki.qemu.org/Features/MicroCheckpointing (accessed Jul. 12, 2020).
[30] Y. Tamura, K. Sato, S. Kihara, and S. Moriai, “Kemari: virtual machine
synchronization for fault tolerance,” Jan. 2008.
[31] “VMware vSphere 6 Fault Tolerance: Architecture and Performance,” Fault Toler., p.
21.
[32] P. Svärd, B. Hudzia, J. Tordsson, and E. Elmroth, “Evaluation of Delta Compression
Techniques for Efficient Live Migration of Large Virtual Machines,” Jul. 2011, vol. 46, pp.
111–120, doi: 10.1145/2007477.1952698.
[33] L. Li and Y. Zhang, “KVM Live Migration Optimization - KVM Forum 2015.” http://
www.linux-kvm.org/images/b/b3/02x-09-Cedar-Liang_Li-
KVMLiveMigrationOptimization.pdf (accessed Jul. 10, 2020).
[34] X. Song, J. Shi, R. Liu, J. Yang, and H. Chen, “Parallelizing Live Migration of Virtual
Machines,” ACM SIGPLAN Not., vol. 48, Mar. 2013, doi: 10.1145/2451512.2451531.
[35] M. Hines, U. Deshpande, and K. Gopalan, “Post-copy live migration of virtual
machines,” Oper. Syst. Rev., vol. 43, pp. 14–26, Jul. 2009, doi: 10.1145/1618525.1618528.
[36] “Features/AutoconvergeLiveMigration - QEMU.”
https://wiki.qemu.org/Features/AutoconvergeLiveMigration (accessed Jul. 10, 2020).
[37] “qemu git repository: docs/xbzrle.txt,” GitHub. https://github.com/qemu/qemu
(accessed Jul. 10, 2020).
[38] “open(2) - Linux manual page.” https://man7.org/linux/man-pages/man2/open.2.html
(accessed Jul. 10, 2020).
[39] “ChangeLog/2.10 - QEMU.”
https://wiki.qemu.org/ChangeLog/2.10#Block_devices_and_tools (accessed Jul. 10, 2020).
[40] “fcntl(2) - Linux manual page.”
https://www.man7.org/linux/man-pages/man2/fcntl.2.html (accessed Jul. 10, 2020).
[41] “Percona-Lab/tpcc-mysql,” Jul. 10, 2020. https://github.com/Percona-Lab/tpcc-mysql
(accessed Jul. 10, 2020).
[42] “acmeair/acmeair-nodejs,” Jul. 07, 2020. https://github.com/acmeair/acmeair-nodejs
(accessed Jul. 10, 2020).
[43] “Node.js Benchmarking.” https://benchmarking.nodejs.org/ (accessed Jul. 10, 2020).
[44] “lz4/lz4,” Aug. 15, 2020. https://github.com/lz4/lz4 (accessed Aug. 16, 2020).
電子全文 電子全文(網際網路公開日期:20230731)
連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
系統版面圖檔 系統版面圖檔