1/37
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Fault-tolerance
is the process of working of a system in a proper way in spite of the occurrence of the failures in the system.
systems
are designed in such a way that in case of error availability and failure, system does the work properly and given correct result.
fault tolerance
When a computer, server, network, or another IT component keeps operating even when a component fails, __________ is responsible.
Stay Operational
Reduce risk
Buy time
Create a fault-tolerant design to:
Stay operational
Make sure your system doesn't go down altogether when something breaks.
Reduce risks
Bar disruptions stemming from one critical piece of hardware or software. Overlap functions, so you can share the load in a crisis.
Buy time.
Fixing any kind of IT problem requires investigation and savvy. Fault tolerance ensures people can keep working while you hunt down the source.
early fault-tolerance plans
involved alerts. A system notified staff when something was about tofail, and they had to step in and do something immediately.
modern fault-tolerance plans
s involve backups and redundancies, so the team can work while the system stays online.
high availability
People sometimes confuse fault tolerance with ?
high availability
refers to how often the system stays up when compared to overall run times. To maintain ____________, a system switches to another system when something fails.
backup
often provides reduced capacity and a poor experience. The company stays online, but work can slow.
Eliminate. Don’t allow a single point of failure. The system operates without stopping, even if you must make repairs
Isolate. You should remove the defective piece from system operation rather than letting it cause a cascade of problems.
Engage. When you complete the repair, the part should come back online with no noticeable disruption.
How can you keep something up and running even while parts and pieces of it are breaking? your program should:
Hardware
Software
Power
Your fault-tolerance plan might include:
Hardware
Build in backups so one can take over when another breaks. Run the min parallel, so they're always online and ready to go.
Software
Multiple instances can take over for one another if one fails
Power
Your IT system always has current, even if your power company experiences a catastrophe.
Replication
Continuation
Recovery
There are multiple fault-tolerance techniques, including:
Replication
Everything breaks in time. For example, most computers last about eight years, even with appropriate maintenance. Duplicating hardware and software ensures you always have a secondary source to lean on when you need to.
Continuation
Ensure that your programs keep running even if errors exist
Recovery
Allow software programs to recover from a failure gracefully.
Protect
Backup
Plan ahead
Repair
Fault-tolerant data centers must:
Parallel heating/cooling systems
keep equipment from breaking due to environmental factors.
Alternative power sources
ensure that the center can operate even when thegrid goes down.
Routine maintenance
ensures that all parts keep working, rather than allowing them to break before you address them.
Backup
Identical or similar systems running in parallel keep operations moving
True
True or False? Fault tolerance makes uptime possible.
Load balancing
is critical for web applications. Multiple servers handle the load, switching back and forth as needed to serve your customers. That same system could help if you're dealing with a catastrophic server issue that takes down an element.
Hardware
Software
Any system has two major components:
Build in Self Test
BIST stands for
Build in Self Test (BIST)
carries out the test of itself after a certain period of time again and again When system detects a fault, it switches out the faulty component and switches in the redundant of it. System basically reconfigure itself in case of fault occurrence.
Triple Modular Redundancy
TMR meaning?
Software fault-tolerance techniques
are used to make the software reliable in the condition of fault occurrence and failure.
N-version Programming
Recovery Blocks
Checkpoint
There are three techniques used in software fault-tolerance:
N-version Programming
is just like TMR in hardware fault-tolerance technique. In _____________, all the redundant copies are run concurrently and result obtained is different from each processing.
is basically to get the all errors during development only.
N versions of software
are developed by N individuals or groups of developers.
Recovery Blocks Technique
is also like the n-version programming but in recovery blocks technique, redundant copies are generated using different algorithms only. In recovery block, all the redundant copies are not run concurrently and these copies are run one by one. Recovery block technique can only be used where the task deadlines are more than task computation time.
Check-pointing and Rollback Recovery
This technique is different from above two techniques of software fault-tolerance. In this technique, system is tested each time when we perform some computation. This techniques is basically useful when there is processor failure or data corruption.