Lecture 11 — Fault Tolerance

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall with Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/11

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

12 Terms

1
New cards

Fault tolerance (definition)

design so faults don’t lead to system failure

2
New cards

Redundancy (definition)

extra elements not needed if fault-free

3
New cards

Why redundancy alone isn’t enough?

need fault/error detection + decision logic can be a single point failure.

4
New cards

TMR

3 modules + voter

5
New cards

TMR limitation

random faults yes, systematic faults no; doesn’t help with 2+ simultaneous failures

6
New cards

NMR tolerance capacity

tolerates (N−1)/2(N-1)/2(N−1)/2 module failures

7
New cards

Dynamic redundancy

main + standby spares + fault detection/switching

8
New cards

Detect faults vs detect errors

often detect errors caused by faults; sometimes enough

9
New cards

Watchdog timer idea

reset if software fails to kick watchdog; limitations exist

10
New cards

Recovery blocks

acceptance tests + fallback implementations; need rollback of side effects

11
New cards

N-version programming

N independently developed versions + voting; used for very critical systems

12
New cards

Shuttle architecture highlight

5 computers; 4 in NMR during critical phases; 5th diverse backup