Using failure injection mechanisms to experiment and evaluate a hierarchical failure detector

Computing grids consist of a large-scale, highly-distributed hardware architecture, often built in a hierarchical way, as cluster federations. At such scales, failures are no longer exceptions, but part of the normal behavior. When designing software for grids, developers have to take failures into account, in order to be able to provide a stable service. The fault-tolerance mechanisms need to be validated and evaluated. It is therefore crucial to make experiments at a large scale, with various volatility conditions, in order to measure the impact of failures on the whole system. This paper presents an experimental tool allowing the user to control the volatility conditions during a practical evaluation of fault-tolerant systems. The tool is based on failure-injection mechanisms. We illustrate the usefulness of our tool through an evaluation of a failure detector specifically designed for hierarchical grids.

Data and Resources

Additional Info

Field Value
Source https://inria.hal.science/inria-00070213
Author Monnet, Sébastien, Bertier, Marin
Maintainer CCSD
Last Updated May 16, 2026, 18:02 (UTC)
Created May 16, 2026, 18:02 (UTC)
Identifier Report N°: RR-5811
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Programming distributed parallel systems for large scale numerical simulation (PARIS) ; Institut de Recherche en Informatique et Systèmes Aléatoires (IRISA) ; Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Rennes (UR)-Institut National des Sciences Appliquées - Rennes (INSA Rennes) ; Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-École normale supérieure - Cachan (ENS Cachan)-Centre Inria de l'Université de Rennes ; Institut National de Recherche en Informatique et en Automatique (Inria)
creator Monnet, Sébastien
date 2006-05-16T00:00:00
harvest_object_id 58a384ea-8cb0-46a7-afcc-e270660063c2
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2025-06-06T00:00:00
set_spec type:REPORT