- problem: existing tools too specialized
(e.g., on fault model, platform)
- approach: common control mechanism for multiple …
- … fault models
(e.g., bit flips in registers and memory, communication, IO)
- … fault triggers
(e.g., path-based, time-based, event-based)
- … fault targets
(e.g., hardware communication interfaces, MPI applications)
- … reporting methods
(e.g., dump memory, but: detail vs. intrusiveness)
- two case studies
- injection in physical layer of Myrinet LAN
- debugger-based injection in space imaging application
D. T. Stott, B. Floering, D. Burke, Z. Kalbarczpk, and R. K. Iyer,
“NFTAPE: a framework for assessing dependability in distributed systems
with lightweight fault injectors,” in Proceedings IEEE International
Computer Performance and Dependability Symposium. IPDS 2000 , 2000,
pp. 91–100.
12