Novel replication technique for detecting and masking failures for parallel software: active parallel replication

Adel Cherif*, Masato Suzuki, Takuya Katayama

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)

Abstract

We present a novel replication technique for parallel applications where instances of the replicated application are active on different group of processors called replicas. The replication technique is based on the FTAG (Fault Tolerant Attribute Grammar) computation model. FTAG is a functional and attribute based model. The developed replication technique implements 'active parallel replication,' - that is, all replicas are active and compute concurrently a different piece of the application parallel code -. In our model replicas cooperate not only to detect and mask failures but also to perform parallel computation. The replication mechanisms are supported by FTAG run time system and are fully application-transparent. Different novel mechanisms for checkpointing and recovery are developed. In our model during rollback recovery only that part of the computation that was detected faulty is discarded. The replication technique takes full advantage of parallel computing to reduce overall computation time.

Original languageEnglish
Pages (from-to)886-892
Number of pages7
JournalIEICE Transactions on Information and Systems
VolumeE80-D
Issue number9
Publication statusPublished - Sept 1997
Externally publishedYes

Fingerprint

Dive into the research topics of 'Novel replication technique for detecting and masking failures for parallel software: active parallel replication'. Together they form a unique fingerprint.

Cite this