3: Software Fault Tolerance

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/23

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

24 Terms

1
New cards

Detector - Pseudocode

If a predicate (e.g. safety condition) is not met, exit if we prioritise safety over liveness.

2
New cards

Corrector - Pseudocode

If a predicate (e.g. safety condition) is not met, then it is enforced.

  • May prioritise liveness over safety.

  • Uses enforcement techniques to ensure the invariant is satisfied again.

3
New cards

Fault Tolerance Techniques - List

  • Runtime checks

  • Exception handlers

  • Forward Error Recovery

  • Backward Error Recovery

4
New cards

Runtime Checks

Used for detecting errors.

5
New cards

Runtime Checks - Examples

  • Replication checks - compare outputs of matching modules

  • Timing checks - use of timers to check timing constraints

  • Reversal checks - reverse output and check against inputs

  • Coding checks - parity, Hamming codes, etc.

  • Reasonable checks - semantic properties of data

  • Structural checks - redundancy in data structures

  • Validity checks - divide by 0, array bounds, overflow

6
New cards

Exception Handlers

Detectors raise exceptions (interrupt signals)

7
New cards

Exceptions - Examples

  • Interface exception

  • Local/internal exception

  • Failure exception

8
New cards

Interface Exception

Invalid service request detected by interface detectors and corrected by the service requester.

9
New cards

Local/internal Exception

Problems with internal operations detected by local detectors, and corrected by local correctors.

10
New cards

Failure Exception

Internal errors propagate to the interface and detected by detectors - global correction may be needed.

11
New cards

Forward Error Recovery

Upon error detection, the program attempts to get into a non-erroneous state.

12
New cards

Backward Error Recovery

Upon error detection, the program rolls back to a previously “recorded” good point then restart execution.

13
New cards

Checkpoints

A snapshot of a program’s state in a given point of time, restarting from there if a failure is met.

14
New cards

Recovery Line

A set of checkpoints across all processes to which the programs can be rolled back in the event of a failure.

15
New cards

Checkpoints - Best Practices

  • OK to checkpoint after a message send

  • Not OK to checkpoint before a message receive

16
New cards

Recovery Lines - Consistency Requirement

There are no messages that originate after the line and terminate before it (there are no receives without corresponding sends).

17
New cards

Recovery Blocks

A mechanism that runs a series of algorithms in different implementations before running them through an acceptance test.

  • 1 hardware channel, N software channels

<p>A mechanism that runs a series of algorithms in different implementations before running them through an acceptance test.</p><ul><li><p>1 hardware channel, N software channels</p></li></ul><p></p>
18
New cards

Recovery Blocks - Process

  • First primary algorithm is run.

    • If the algorithm execution is error-free, test against the acceptance test.

    • If the evaluation of the acceptance test is error-free and yields true, the test is satisfied.

  • If the execution/evaluation was not error-free, where the program will then restore to a state prior to execution

  • After that, an alternative algorithm is attempted

19
New cards

Recovery Blocks - Possible Failure Reasons

  • Error in the module operation

  • Timeout has expired

  • Raised exception

20
New cards

N-Version Programming

N different developers build different implementations to solve a problem without sharing their approach, and all implementations are run at the same time before being decided on the correct one.

  • Lack of agreement indicates a problem occurred.

  • N hardware channels, N software channels

21
New cards

N-Version Programming - Axiom

  • Will not work if all versions fail on the same inputs.

  • Will not work if they fail in similar ways.

22
New cards

Fault Injection Analysis

Subjects a system to abnormal conditions, such as introducing faults and errors to assess behaviour.

23
New cards

N-Version Programming - Selection Process

Analyses output, and selects the “best” output - done with a diagnostic program.

24
New cards

N-Version Programming - Problem Handling

  • Can restart or retry

    • Neither liveness nor safety critical

  • Transition to a predefined state followed by later retries

    • Not recommended when liveness is critical

  • Designating a version as being more reliable

    • Liveness is preserved