1/11
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Fault tolerance (definition)
design so faults don’t lead to system failure
Redundancy (definition)
extra elements not needed if fault-free
Why redundancy alone isn’t enough?
need fault/error detection + decision logic can be a single point failure.
TMR
3 modules + voter
TMR limitation
random faults yes, systematic faults no; doesn’t help with 2+ simultaneous failures
NMR tolerance capacity
tolerates (N−1)/2(N-1)/2(N−1)/2 module failures
Dynamic redundancy
main + standby spares + fault detection/switching
Detect faults vs detect errors
often detect errors caused by faults; sometimes enough
Watchdog timer idea
reset if software fails to kick watchdog; limitations exist
Recovery blocks
acceptance tests + fallback implementations; need rollback of side effects
N-version programming
N independently developed versions + voting; used for very critical systems
Shuttle architecture highlight
5 computers; 4 in NMR during critical phases; 5th diverse backup