Skip to content

Resilience Scenarios

xzfc edited this page Dec 27, 2020 · 3 revisions

There is a list of unwanted scenarios that may happen during a test that may interrupt the test flow and/or invalidate its results.

  1. Connection between the client and the server lost due to network failure.
  2. The client crashed.
  3. The workload crashed.
  4. The server crashed.
  5. Analyzer or PTDaemon issue occurred.
    • USB/network connection between a server and an analyzer lost.
    • Connection between a server and PTDaemon failed.
    • Analyzer switched off, unplugged, or crashed.

At a bare minimum, we should detect each of these scenarios, interrupt the test, and let the user know that the test is interrupted unexpectedly. A message "Test completed successfully" should not appear in a case of failure; it should be trustworthy.

Taking a step further, we may perform an attempt to recover from these situations if it is possible. Here is a table of applicable recovery ways for each failure scenario.

handshake ranging prepare
logs
send
logs
testing prepare
logs
send
logs
Network failure - reconnect reconnect reconnect reconnect reconnect reconnect
Client crashed - restart reconnect1 reconnect1 restart1 reconnect1 reconnect1
Workload crashed - restart - - restart - -
Server crashed - restart reconnect2 reconnect2 restart2 reconnect2 reconnect2
Analyzer issue ptd restart ptd ptd restart ptd ptd
  • restart Restart workload and start power measurement again. Current phase logs are invalidated since they are not complete. It may be done either fully automatically (e.g. --max-tries 5) or manual (e.g. --continue).
  • reconnect Client should reconnect to the server and resend messages that are lost (if any).
    • 1 — The client should be able to store its state on a disc rather than in-memory. This would require a manual restart of the client.
    • 2 — The server should be able to store its state on a disc rather than in-memory. This would require a restart of the server.
  • ptd Restart PTDaemon and reconnect to the analyzer.
Clone this wiki locally