Failure detectors for large-scale distributed systems

Naohiro Hayashibara, Adel Cherif, Takuya Katayama

Research output: Contribution to journalArticlepeer-review

71 Citations (Scopus)

Abstract

This paper discusses the problem of implementing a scalable failure detection service for Grid systems. More specifically, traditional implementations of failure detectors are often tuned for running over local networks and fail to address some important problems found in wide-area distributed systems, such as Grid systems. We identify some of the most important problems raised in the context of Grids. We then survey recent propositions that can help in solving some of these problems.

Original languageEnglish
Article number51
Pages (from-to)404-409
Number of pages6
JournalProceedings of the IEEE Symposium on Reliable Distributed Systems
DOIs
Publication statusPublished - 2002
Externally publishedYes

Keywords

  • Distributed systems
  • Failure detection
  • Fault tolerance
  • Grid computing
  • Grid system

Fingerprint

Dive into the research topics of 'Failure detectors for large-scale distributed systems'. Together they form a unique fingerprint.

Cite this