|
|
|













|
|
-
Sriram Sankaran,
Jeffrey M. Squyres,
Brian Barrett,
Andrew Lumsdaine,
Jason Duell,
Paul Hargrove,
and Eric Roman.
The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing.
International Journal of High Performance Computing Applications,
19(4):479--493,
Winter 2005.
Keywords:
MPI,
checkpoint/restart,
rollback-recovery.
[bibtex-entry]
-
Joshua Hursey,
Timothy I. Mattox,
and Andrew Lumsdaine.
Interconnect Agnostic Checkpoint/Restart in Open MPI.
In Proceedings of the Eighteenth International Symposium on High Performance Distributed Computing (HPDC 2009),
June 2009.
ACM.
Keywords:
Open MPI,
high performance computing,
rollback-recovery,
MPI,
fault tolerance,
checkpoint/restart,
High Speed Interconnect,
InfiniBand,
Shared Memory,
Myrinet,
Checkpoint Coordination Protocol.
[bibtex-entry]
-
Joshua Hursey,
Jeffrey M. Squyres,
Timothy I. Mattox,
and Andrew Lumsdaine.
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI.
In Proceedings of the 21st IEEE International Parallel and Distributed Processing Symposium (IPDPS),
March 2007.
IEEE Computer Society.
Keywords:
Open MPI,
high performance computing,
rollback-recovery,
MPI,
fault tolerance,
checkpoint/restart.
[bibtex-entry]
BACK TO INDEX
Disclaimer:
This material is presented to ensure timely dissemination of
scholarly and technical work. Copyright and all rights therein
are retained by authors or by other copyright holders.
All person copying this information are expected to adhere to
the terms and constraints invoked by each author's copyright.
In most cases, these works may not be reposted
without the explicit permission of the copyright holder.
Last modified: Thu Nov 5 01:40:44 2009
Author: dikim.
This document was translated from BibTEX by
bibtex2html
|
|
|
|
|
|
|