3: Software Fault Tolerance

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

24 Terms

1
New cards

Detector - Pseudocode

If a predicate (e.g. safety condition) is not met, exit if we prioritise safety over liveness.

2
New cards

Corrector - Pseudocode

If a predicate (e.g. safety condition) is not met, then it is enforced.

  • May prioritise liveness over safety.

  • Uses enforcement techniques to ensure the invariant is satisfied again.

3
New cards

Fault Tolerance Techniques - List

  • Runtime checks

  • Exception handlers

  • Forward Error Recovery

  • Backward Error Recovery

4
New cards

Runtime Checks

Used for detecting errors.

5
New cards

Runtime Checks - Examples

  • Replication checks - compare outputs of matching modules

  • Timing checks - use of timers to check timing constraints

  • Reversal checks - reverse output and check against inputs

  • Coding checks - parity, Hamming codes, etc.

  • Reasonable checks - semantic properties of data

  • Structural checks - redundancy in data structures

  • Validity checks - divide by 0, array bounds, overflow

6
New cards

Exception Handlers

Detectors raise exceptions (interrupt signals)

7
New cards

Exceptions - Examples

  • Interface exception

  • Local/internal exception

  • Failure exception

8
New cards

Interface Exception

Invalid service request detected by interface detectors and corrected by the service requester.

9
New cards

Local/internal Exception

Problems with internal operations detected by local detectors, and corrected by local correctors.

10
New cards

Failure Exception

Internal errors propagate to the interface and detected by detectors - global correction may be needed.

11
New cards

Forward Error Recovery

Upon error detection, the program attempts to get into a non-erroneous state.

12
New cards

Backward Error Recovery

Upon error detection, the program rolls back to a previously “recorded” good point then restart execution.

13
New cards

Checkpoints

A snapshot of a program’s state in a given point of time, restarting from there if a failure is met.

14
New cards

Recovery Line

A set of checkpoints across all processes to which the programs can be rolled back in the event of a failure.

15
New cards

Checkpoints - Best Practices

  • OK to checkpoint after a message send

  • Not OK to checkpoint before a message receive

16
New cards

Recovery Lines - Consistency Requirement

There are no messages that originate after the line and terminate before it (there are no receives without corresponding sends).

17
New cards

Recovery Blocks

A mechanism that runs a series of algorithms in different implementations before running them through an acceptance test.

  • 1 hardware channel, N software channels

<p>A mechanism that runs a series of algorithms in different implementations before running them through an acceptance test.</p><ul><li><p>1 hardware channel, N software channels</p></li></ul><p></p>
18
New cards

Recovery Blocks - Process

  • First primary algorithm is run.

    • If the algorithm execution is error-free, test against the acceptance test.

    • If the evaluation of the acceptance test is error-free and yields true, the test is satisfied.

  • If the execution/evaluation was not error-free, where the program will then restore to a state prior to execution

  • After that, an alternative algorithm is attempted

19
New cards

Recovery Blocks - Possible Failure Reasons

  • Error in the module operation

  • Timeout has expired

  • Raised exception

20
New cards

N-Version Programming

N different developers build different implementations to solve a problem without sharing their approach, and all implementations are run at the same time before being decided on the correct one.

  • Lack of agreement indicates a problem occurred.

  • N hardware channels, N software channels

21
New cards

N-Version Programming - Axiom

  • Will not work if all versions fail on the same inputs.

  • Will not work if they fail in similar ways.

22
New cards

Fault Injection Analysis

Subjects a system to abnormal conditions, such as introducing faults and errors to assess behaviour.

23
New cards

N-Version Programming - Selection Process

Analyses output, and selects the “best” output - done with a diagnostic program.

24
New cards

N-Version Programming - Problem Handling

  • Can restart or retry

    • Neither liveness nor safety critical

  • Transition to a predefined state followed by later retries

    • Not recommended when liveness is critical

  • Designating a version as being more reliable

    • Liveness is preserved