Part 3. Fault Tolerance

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/37

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

38 Terms

1
New cards

Fault-tolerance

is the process of working of a system in a proper way in spite of the occurrence of the failures in the system.

2
New cards

systems

are designed in such a way that in case of error availability and failure, system does the work properly and given correct result.

3
New cards

fault tolerance

When a computer, server, network, or another IT component keeps operating even when a component fails, __________ is responsible.

4
New cards
  • Stay Operational

  • Reduce risk

  • Buy time

Create a fault-tolerant design to:

5
New cards

Stay operational

Make sure your system doesn't go down altogether when something breaks.

6
New cards

Reduce risks

Bar disruptions stemming from one critical piece of hardware or software. Overlap functions, so you can share the load in a crisis.

7
New cards

Buy time.

Fixing any kind of IT problem requires investigation and savvy. Fault tolerance ensures people can keep working while you hunt down the source.

8
New cards

early fault-tolerance plans

involved alerts. A system notified staff when something was about tofail, and they had to step in and do something immediately.

9
New cards

modern fault-tolerance plans

s involve backups and redundancies, so the team can work while the system stays online.

10
New cards

high availability

People sometimes confuse fault tolerance with ?

11
New cards

high availability

refers to how often the system stays up when compared to overall run times. To maintain ____________, a system switches to another system when something fails.

12
New cards

backup

often provides reduced capacity and a poor experience. The company stays online, but work can slow.

13
New cards
  • Eliminate. Don’t allow a single point of failure. The system operates without stopping, even if you must make repairs

  • Isolate. You should remove the defective piece from system operation rather than letting it cause a cascade of problems.

  • Engage. When you complete the repair, the part should come back online with no noticeable disruption.

How can you keep something up and running even while parts and pieces of it are breaking? your program should:

14
New cards
  • Hardware

  • Software

  • Power

Your fault-tolerance plan might include:

15
New cards

Hardware

Build in backups so one can take over when another breaks. Run the min parallel, so they're always online and ready to go.

16
New cards

Software

Multiple instances can take over for one another if one fails

17
New cards

Power

Your IT system always has current, even if your power company experiences a catastrophe.

18
New cards
  • Replication

  • Continuation

  • Recovery

There are multiple fault-tolerance techniques, including:

19
New cards

Replication

Everything breaks in time. For example, most computers last about eight years, even with appropriate maintenance. Duplicating hardware and software ensures you always have a secondary source to lean on when you need to.

20
New cards

Continuation

Ensure that your programs keep running even if errors exist

21
New cards

Recovery

Allow software programs to recover from a failure gracefully.

22
New cards
  • Protect

  • Backup

  • Plan ahead

  • Repair

Fault-tolerant data centers must:

23
New cards

Parallel heating/cooling systems

keep equipment from breaking due to environmental factors.

24
New cards

Alternative power sources

ensure that the center can operate even when thegrid goes down.

25
New cards

Routine maintenance

ensures that all parts keep working, rather than allowing them to break before you address them.

26
New cards

Backup

Identical or similar systems running in parallel keep operations moving

27
New cards

True

True or False? Fault tolerance makes uptime possible.

28
New cards

Load balancing

is critical for web applications. Multiple servers handle the load, switching back and forth as needed to serve your customers. That same system could help if you're dealing with a catastrophic server issue that takes down an element.

29
New cards
  • Hardware

  • Software

Any system has two major components:

30
New cards

Build in Self Test

BIST stands for

31
New cards

Build in Self Test (BIST)

carries out the test of itself after a certain period of time again and again When system detects a fault, it switches out the faulty component and switches in the redundant of it. System basically reconfigure itself in case of fault occurrence.

32
New cards

Triple Modular Redundancy

TMR meaning?

33
New cards

Software fault-tolerance techniques

are used to make the software reliable in the condition of fault occurrence and failure.

34
New cards
  • N-version Programming

  • Recovery Blocks

  • Checkpoint

There are three techniques used in software fault-tolerance:

35
New cards

N-version Programming

  • is just like TMR in hardware fault-tolerance technique. In _____________, all the redundant copies are run concurrently and result obtained is different from each processing.

  • is basically to get the all errors during development only.

36
New cards

N versions of software

are developed by N individuals or groups of developers.

37
New cards

Recovery Blocks Technique

is also like the n-version programming but in recovery blocks technique, redundant copies are generated using different algorithms only. In recovery block, all the redundant copies are not run concurrently and these copies are run one by one. Recovery block technique can only be used where the task deadlines are more than task computation time.

38
New cards

Check-pointing and Rollback Recovery

This technique is different from above two techniques of software fault-tolerance. In this technique, system is tested each time when we perform some computation. This techniques is basically useful when there is processor failure or data corruption.