Haskell distributed parallel Haskell Reliable Scheduling (HdpH-RS) is a Haskell DSL for fault tolerant parallel computation on distributed-memory architectures. HdpH-RS is a derivation of HdpH and is a research tool for scalable and reliable scheduling on HPC platforms. HdpH-RS is implemented entirely in Haskell but does make use of a few GHC extensions, most notably TemplateHaskell.
The design, validation and implementation of HdpH-RS is described in my PhD thesis Reliable Massively Parallel Symbolic Computing Fault Tolerance for a Distributed Haskell detailed here. The contributions of HdpH-RS include:
- An extension to the HdpH language operational semantics for fault tolerance in HdpH-RS has been formulated (Chapter 3).
- The design of a fault tolerant work stealing scheduling protocol has been abstracted in Promela (here) and verified with the SPIN model checker (Chapter 4), and implemented in Haskell (Chapter 5).
- The scalability of HdpH-RS has been measured up to 1400 cores on HECToR, a UK computing resource. HdpH-RS fault tolerance has been tested with Chaos Monkey unit tests on a Beowulf commodity cluster (Chapter 6).
This release is considered alpha stage. Moreover, this repository is unlikely to receive further commits. Go to the HdpH GitHub page to download the upstream version of HdpH. For details on how to execute HdpH-RS programs on distributed-memory architectures, see Section 6.2 of my thesis.