|
[1] A. Acharya and B. R. Badrinath, "Checkpointing Distributed Applications on Mobile Computers," Proc. Third Int''l Conf. Parallel and Distributed Information Systems, pp.73-80, September 1994. [2] J. M. Adamo, "ARCH, An Object-Oriented Library for Asynchronous and Loosely Synchronous System Programming," Tech. Rep. ncstrl.cornell.tc/95-228, Institution Cornell University, Theory Center, 1995. [3] S. Alagar and S. Venkatesan, "Causal Ordering in Distributed Mobile Systems," IEEE Transactions on Computers, Vol. 46, No. 3, pp. 353-361, March 1997. [4] NASA Ames Research Center. NAS Parallel Benchmarks. http://science.nas.nasa.gov/Software/NPB, 1997 [5] R. Baldoni, J. M. Helary, A. Mostefaoui, M. Raynal, "On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems," Tech. Rep. RR-2564, IRISA, July 1995. [6] G. Barigazzi, L. Strigini. "Application-Transparent Setting of Recovery Point", Proc. 13th Fault-Tolerant computing Symposium, FTCS-13, pp. 48-55,1993. [7] R. Baldoni, J. M. Helary, A. Mostefaoui, M. Raynal, "On Modeling Consistent Checkpoints and the Domino Effect in Distributed Systems," Tech. Rep. RR-2564, IRISA, July 1995. [8] A. Beguelin, E. Seligman, P. Stephan, "Application Level Fault Tolerance in Heterogeneous Networks of Workstations," Tech. Rep. CMU-CS-96-157, School of Computer Science, Carnegie Mellon University, Pittsburgh, 1996. [9] G. Burns, R. Daoud, and J. Vaigl, "LAM: An Open Cluster Environment for MPI," Ohio Supercomputer Center, 1994. [10] P. E. Chung, Y. Huang, and D. Liang. "Winckp: a Transprent Checkpointing and Rollback Recovery Tool for Windows NT Applicatons." Proceedings of the 29th IEEE Fault-Tolerant Computing Symposium (FTCS-29), 1999. [11] A. Clematis, G. Deconinck, and V.Gianuzzi. "A Flexible State-saving Library for Message-passing Systems," Proceedings of the 28th IEEE Fault-Tolerant Computing Symposium (FTCS-28), Germany, June 1998. [12] C. R. Dow, J. S. Chen, J. C. Chen, M.C. Hsieh, "A Transparent Checkpointing System for MPI," Proceedings of National Computer Symposium, PP. C-289-296, 1999 [13] C. R. Dow and Y. G. Gou, "A Parallel/Distributed Debugger for MPI," Proceedings of 1997 Workshop on Distributed System Techniques and Applications, Tainan, Taiwan, May 1997. [14] W. R. Dieter, J. E. Lumpp. "A User-level Checkpointing Library for POSIX Threads Programs." Proceedings of the 29th IEEE Fault-Tolerant Computing Symposium (FTCS-29), 1999. [15] E. N. Elnozahy, D. B. Johnson, and Y.M. Wang, "A Survey of Rollback Protocols in Message-Passing Systems," Tech. Rep. CMU-CS-96-181, School of Computer Science, Carnegie Mellon University, October 1997. [16] J. Fowler and W. Zwacenepoel, "Causal Distributed Breakpoints," 10th International Conference on Distributed Computing Systems, pp. 134-141, 1990. [17] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, V. Sunderam, "PVM 3 User''s Guide and Reference Manual," Oak Ridge National Laboratory, May 1993. [18] W. Gropp and E. Lusk, "Creating a New MPICH Device Using the Channel Interface", Technical Report ANL/MCS-TM-213, Argonne National Laaboratory, 1995. [19] W. Gropp and E. Lusk, "MPICH: A Case Study in the Dissemination of a Portable Environment for Parallel Scientific Computing," IJSA, 11(2):103--114, Summer 1997. [20] V. Herrarte, E. Lusk, "Studying Parallel Program Behavior with Upshot," Tech. Rep. ANL - 91/15, Argonne National Laboratory, Argonne, IL 60439, 1991. [21] R. T. Hood, "The p2d2 Project: Building a Portable Debugger," Proceedings of SPDT''96: SIGMETRICS Symposium on Parallel and Distributed Tools, May 1996. [22] R. Koo and S. Toueg, "Checkpointing and Rollback Recovery for Distributed Systems," IEEE Transactions on Software Engineering, Vol. SE-13, No. 1, pp. 23-31, January 1987. [23] L. Lamport, "Time, Clock and the Ordering of Events in Distributed Systems," Comm. ACM, Vol. 21, No. 7, pp. 558-565, July 1978. [24] D. Libes, "X Wrapper for Non-Graphic Interactive Programs," Proceedings of X hibition 94, San Jose, California, June 1994. [25] C. M. Lin and C. R. Dow, "Efficient Independent Checkpointing Techniques for Message-Passing Programs," Tech. Rep., Department of Information Engineering and Computer Science, Feng-Chia University, January 1998. [26] J. Long, W. K. Fuchs, K.A. Abraham. "Compiler-Assisted Static Checkpoint Insertion", Proc. 22th Fault-Tolerant Computing Symposium, FTCS-22, pp. 58-65, 1992. [27] D. Manivannan and M. Singhal, "A Low-Overhead Recovery Technique Using Quasi-Synchronous Checkpointing," In Proc. IEEE, Int. Conf. Distributed Computer System, pp. 100-107, 1996. [28] D. Manivannan, R. Netzer, and M. Singal, "Finding Consistent Global Checkpoints in a Distributed Computation," IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 6, pp. 623-627, June 1997. [29] P. McGrath, B. Tangney. "Scrabble - A Distributed Application with an Emphasis on Continuity", Software Engineering Journal, pp. 160-164, May 1990. [30] R. H. B. Netzer and J. Xu, "Necessary and Sufficient Conditions for Consistent Global Snapshots," IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 2, pp.165-169, February 1995. [31] N. Neves and W. K. Fuchs. "RENEW: A Tool for Fast and Efficient Implementation of Checkpoint Protocols," Proceedings of the 28th IEEE Fault-Tolerant Computing Symposium (FTCS), Germany, June 1998. [32] J. S. Plank, Y. Chen, and K. Li, "CLIP: A Checkpointing Tool for Message-Passing Parallel Programs," Tech. Rep., Department of Computer Science, University of Princeton, 1996. [33] J. S. Plank, M. Beck and G. Kingsley, "Libckpt: Transparent Checkpointing Under Unix," USENIX Winter 1995 Technical Conference, New Orleans, Louisiana, January 16-20, 1995. [34] J. S. Plank and K. Li, "Performance Results of Ickp -- A Consistent Checkpointer on the iPSC/860," Scalable High Performance Computing Conference, pp. 686-693, Knoxville, TN, May, 1994. [35] J. S. Plank, "Efficient Checkpointing on MIMD Architectures," PhD thesis, Princeton University, January, 1993. [36] R. Prakash and M. Singhal, "Low-Cost Checkpointing and Failure Recovery in Mobile Computing Systems," IEEE Transactions on Parallel and Distributed Systems, Vol. 7, No. 10, pp. 1035-1048, October 1996. [37] G. Stellner, "CoCheck: Checkpointing and Process Migration for MPI," 10th International Parallel Processing Symposium, April 1996. [38] N. H. Vaidya, "Staggered Consistent Checkpointing", IEEE Transactions on Parallel and Distributed System, Vol. 10, No.7, July, 1999. [39] Y. M. Wang, "Consistent Global Checkpoints that Contain a Given Set of Local Checkpoints," IEEE Transactions on Computers, Vol. 46, No. 4, pp. 456-468, April 1997. [40] Message Passing Interface Forum, "MPI: a Message-Passing Interface Standard," Tech. Rep. CS-94-230, Department of Computer Science, University of Tennessee, Knoxville, TN, 1994. [41] Message Passing Interface Forum, "MPI2: Extension to the Message-Passing Interface," November 1995. [42] Para++: C++ Bindings for Message Passing Libraries, INRIA RT-0174, June 1995.
|