1/51
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Safety
a property of a system that reflects the system’s ability to operate, normally or abnormally, without danger of causing human injury or death and without damage to the system incorporate software-based stem’s environment.
It is important to consider software safety as most devices whose failure is critical now control systems.
Safety vs reliability
Safety and reliability are related but distinct
In general, reliability and availability are necessary but not sufficient conditions for system safety.
Reliability is concerned with conformance to a given specification and delivery of service
Safety is concerned with ensuring system cannot cause damage irrespective of whether or not it conforms to its specification.
System reliability is essential for safety but is not enough
Reliable systems can be unsafe
Case in point: Boeing 737 Max failure/crashes
Unsafe reliable systems
There may be dormant faults in a system that are undetected for many years and only rarely arise.
Specification errors
If the system specification is incorrect then the system can behave as specified but still cause an accident.
Hardware failures generating spurious inputs
Hard to anticipate in the specification.
Context-sensitive commands i.e. issuing the right command at the wrong time
Often the result of operator error.
Safety critical systems
Systems where it is essential that system operation is always safe i.e., the system should never cause damage to people or the system’s environment
Examples:
Control and monitoring systems in aircraft
Process control systems in chemical manufacture
Automobile control systems such as braking and engine management systems
Primary safety-critical systems
Embedded software systems whose failure can cause the associated hardware to fail and directly threaten people. Example is the insulin pump control system
Secondary safety-critical systems
Systems whose failure results in faults in other (socio-technical) systems, which can then have safety consequences.
• For example, the Mentcare system is safety-critical as failure may lead to inappropriate treatment being prescribed.
• Infrastructure control systems are also secondary safety-critical systems.
Hazards
Situations or events that can lead to, or cause an accident
Stuck valve in reactor control system
Incorrect computation by software in navigation system
Failure to detect possible allergy in medication prescribing system
Hazards do not inevitably result in accidents – accident prevention actions can be taken.
Safety achievements
Hazard avoidance
The system is designed so that some classes of hazard simply cannot arise.
Hazard detection and removal
The system is designed so that hazards are detected and removed before they result in an accident.
Damage limitation
The system includes protection features that minimize the damage that may result from an accident.
Accident(or mishap)
An unplanned event or sequence of events which results in human death or injury, damage to property, or to the environment. An overdose of insulin is an example of an accident.
Damage
A measure of the loss resulting from a mishap. Damage can range from many people being killed as a result of an accident to minor injury or property damage. Damage resulting from an overdose of insulin could be serious injury or the death of the user of the insulin pump.
Hazard severity
An assessment of the worst possible damage that could result from a particular hazard. Hazard severity can range from catastrophic, where many people are killed, to minor, where only minor damage results. When an individual death is a possibility, a reasonable assessment of hazard severity is ‘very high’
Hazard probability
The probability of the events occurring which create a hazard. Probability values tend to be arbitrary but range from ‘probable’ (say 1/100 chance of a hazard occurring) to ‘implausible’ (no conceivable situations are likely in which the hazard could occur). The probability of a sensor failure in the insulin pump that results in an overdose is probably low.
Risk
This is a measure of the probability that the system will cause an accident. The risk is assessed by considering the hazard probability, the hazard severity, and the probability that the hazard will lead to an accident. The risk of an insulin overdose is probably medium to low.
Normal accidents
Accidents in complex systems rarely have a single cause as these systems are designed to be resilient to a single point of failure
Designing systems so that a single point of failure does not cause an accident is a fundamental principle of safe systems design.
Almost all accidents are a result of combinations of malfunctions rather than single failures.
It is probably the case that anticipating all problem combinations, especially, in software controlled systems is impossible so achieving complete safety is impossible. Accidents are inevitable.
Software safety benefits
Software monitoring and control allows a wider range of conditions to be monitored and controlled than is possible using electro-mechanical safety systems.
Software control allows safety strategies to be adopted that
reduce the amount of time people spend in hazardous
environments.
Software can detect and correct safety-critical operator errors.
Safety requirements
The goal of safety requirements engineering is to identify protection requirements that ensure that system failures do not cause injury or death or environmental damage.
Safety requirements may be ‘shall not’ requirements i.e., they define situations and events that should never occur.
Functional safety requirements define:
Checking and recovery features that should be included in a system
Features that provide protection against system failures and external attacks
Hazard-driven analysis
Hazard identification
Hazard assessment
Hazard analysis
Risk reduction: safety requirements identified
Hazard identification
Identify the hazards that may threaten the system.
Hazard identification may be based on different types of hazard:
Physical hazards
Electrical hazards
Biological hazards
Service failure hazards
Etc.
Hazard assessment
The process is concerned with understanding the likelihood that a risk will arise and the potential consequences if an accident or incident should occur.
Risks may be categorised as:
Intolerable. Must never arise or result in an accident
As low as reasonably practical(ALARP). Must minimise the possibility of risk given cost and schedule constraints
Acceptable. The consequences of the risk are acceptable and no extra costs should be incurred to reduce hazard probability
Estimate the risk probability and the risk severity.
It is not normally possible to do this precisely so relative values are used such as ‘unlikely’, ‘rare’, ‘very high’, etc.
The aim must be to exclude risks that are likely to arise or that have high severity.
Social acceptability of risk
The acceptability of a risk is determined by human, social and political considerations.
In most societies, the boundaries between the regions are pushed upwards with time i.e., society is less willing to accept risk
For example, the costs of cleaning up pollution may be less than the costs of preventing it but this may not be socially acceptable.
Risk assessment is subjective
Risks are identified as probable, unlikely, etc. This depends on who is making the assessment.
Hazard analysis
Concerned with discovering the root causes of risks in a particular system.
Techniques have been mostly derived from safety-critical systems and can be
Inductive, bottom-up techniques. Start with a proposed system failure and assess the hazards that could arise from that failure;
Deductive, top-down techniques. Start with a hazard and deduce what the causes of this could be.
There should be clear traceability from identified hazards through their analysis to the actions taken during the process to ensure that these hazards have been covered.
A hazard log may be used to track hazards throughout the process
Fault-tree analysis
A deductive top-down technique.
Put the risk or hazard at the root of the tree and identify the system states that could lead to that hazard.
Where appropriate, link these with ‘and’ or ‘or’ conditions.
A goal should be to minimize the number of single causes of system failure.
Risk reduction
The aim of this process is to identify dependability requirements that specify how the risks should be managed and ensure that accidents/incidents do not arise.
Risk reduction strategies
Hazard avoidance;
Hazard detection and removal;
Damage limitation.
Safety Engineering processes
are based on reliability engineering processes
Plan-based approach with reviews and checks at each stage in the process
General goal of fault avoidance and fault detection
Must also include safety reviews and explicit identification and tracking of hazards
Regulation
Regulators may require evidence that safety engineering processes have been used in system development
For example:
The specification of the system that has been developed and records of the checks made on that specification.
Evidence of the verification and validation processes that have been carried out and the results of the system verification and validation.
Evidence that the organizations developing the system have defined and dependable software processes that include safety assurance reviews. There must also be records that show that these processes have been properly enacted.
Agile methods & safety
Agile methods are not usually used for safety-critical systems engineering
Extensive process and product documentation is needed for system regulation. Contradicts the focus in agile methods on the software itself.
A detailed safety analysis of a complete system specification is important. Contradicts the interleaved development of a system specification and program.
Some agile techniques such as test-driven development may be used
Safety assurance processes
Process assurance involves defining a dependable process and ensuring that this process is followed during the system development.
Process assurance focuses on:
Do we have the right processes? Are the processes appropriate for the level of dependability required. Should include requirements management, change management, reviews and inspections, etc.
Are we doing the processes right? Have these processes been followed by the development team.
Process assurance generates documentation
Agile processes therefore are rarely used for critical systems.
Process assurance is important for safety-critical systems development:
Accidents are rare events so testing may not find all problems;
Safety requirements are sometimes ‘shall not’ requirements so cannot be demonstrated through testing.
Safety assurance activities may be included in the software process that record the analyses that have been carried out and the people responsible for these.
Personal responsibility is important as system failures may lead to subsequent legal actions.
Safety related process activities
Creation of a hazard logging and monitoring system.
Appointment of project safety engineers who have explicit responsibility for system safety.
Extensive use of safety reviews.
Creation of a safety certification system where the safety of critical components is formally certified.
Detailed configuration management (see Chapter 25).
Safety reviews
Driven by the hazard register.
For each identified hazard, the review team should assess the system and judge whether or not the system can cope with that hazard in a safe way.
Formal verification
Formal methods can be used when a mathematical specification of the system is produced.
They are the ultimate static verification technique that may be used at different stages in the development process:
A formal specification may be developed and mathematically analyzed for consistency. This helps discover specification errors and omissions.
Formal arguments that a program conforms to its mathematical specification may be developed. This is effective in discovering programming and design errors.
Arguments for formal methods
Producing a mathematical specification requires a detailed analysis of the requirements and this is likely to uncover errors.
Concurrent systems can be analyzed to discover race conditions that might lead to deadlock. Testing for such problems is very difficult.
They can detect implementation errors before testing when the program is analyzed alongside the specification.
Arguments against formal methods
Require specialized notations that cannot be understood by domain experts.
Very expensive to develop a specification and even more expensive to show that a program meets that specification.
Proofs may contain errors.
It may be possible to reach the same level of confidence in a program more cheaply using other V & V techniques.
Formal methods cannot guarantee safety
The specification may not reflect the real requirements of system users. Users rarely understand formal notations so they cannot directly read the formal specification to find errors and omissions.
The proof may contain errors. Program proofs are large and complex, so, like large and complex programs, they usually contain errors.
The proof may make incorrect assumptions about the way that the system is used. If the system is not used as anticipated, then the system’s behavior lies outside the scope of the proof.
Model checking
Involves creating an extended finite state model of a system and, using a specialized system (a model checker), checking that model for errors.
explores all possible paths through the model and checks that a user-specified property is valid for each path.
is particularly valuable for verifying concurrent systems, which are hard to test.
Although model checking is computationally very expensive, it is now practical to use it in the verification of small to medium sized critical systems.
Static program analysis
are software tools for source text processing.
They parse the program text and try to discover potentially erroneous conditions and bring these to the attention of the V & V team.
They are very effective as an aid to inspections - they are a supplement to but not a replacement for inspections.
Levels of static analysis
Characteristic error checking
The static analyzer can check for patterns in the code that are characteristic of errors made by programmers using a particular language.
User-defined error checking
Users of a programming language define error patterns, thus extending the types of error that can be detected. This allows specific rules that apply to a program to be checked.
Assertion checking
Developers include formal assertions in their program and relationships that must hold. The static analyzer symbolically executes the code and highlights potential problems.
Use of static analysis
Particularly valuable when a language such as C is used which has weak typing and hence many errors are undetected by the compiler.
Particularly valuable for security checking – the static analyzer can discover areas of vulnerability such as buffer overflows or unchecked inputs.
Static analysis is now routinely used in the development of many safety and security critical systems.
Safety and dependability cases
are structured documents that set out detailed arguments and evidence that a required level of safety or dependability has been achieved.
They are normally required by regulators before a system can be certified for operational use. The regulator’s responsibility is to check that a system is as safe or dependable as is practical.
Regulators and developers work together and negotiate what needs to be included in a system safety/dependability case
Safety case
A documented body of evidence that provides a convincing and valid argument that a system is adequately safe for a given application in a given environment.
Arguments in a safety case can be based on formal proof, design rationale, safety proofs, etc. Process factors may also be included.
A software safety case is usually part of a wider system safety case that takes hardware and operational issues into account.
System description
An overview of the system and a description of its critical components.
Safety requirements
The safety requirements abstracted from the system requirements specification. Details of other relevant system requirements may also be included.
Hazard and risk analysis
Documents describing the hazards and risks that have been identified and the measures taken to reduce risk. Hazard analyses and hazard logs.
Design analysis
A set of structured arguments (see Section 15.5.1) that justify why the design is safe.
Verification and validation
A description of the V & V procedures used and, where appropriate, the test plans for the system. Summaries of the test results showing defects that have been detected and corrected. If formal methods have been used, a formal system specification and any analyses of that specification. Records of static analyses of the source code.
Review reports
Records of all design and safety reviews.
Team competences
Evidence of the competence of all of the team involved in safety-related systems development and validation.
Process QA
Records of the quality assurance processes (see Chapter 24) carried out during system development.
Change management processes
Records of all changes proposed, actions taken and, where appropriate, justification of the safety of these changes. Information about configuration management procedures and configuration management logs.
Associated safety cases
References to other safety cases that may impact the safety case.
Structured safety arguments
Structured arguments that demonstrate that a system meets its safety obligations.
It is not necessary to demonstrate that the program works as intended; the aim is simply to demonstrate safety.
Generally based on a claim hierarchy.
You start at the leaves of the hierarchy and demonstrate safety. This implies the higher-level claims are true
Software safety arguments
Safety arguments are intended to show that the system cannot reach in unsafe state.
These are weaker than correctness arguments which must show that the system code conforms to its specification.
They are generally based on proof by contradiction
Assume that an unsafe state can be reached;
Show that this is contradicted by the program code.
A graphical model of the safety argument may be developed.
Construction of a safety argument
Establish the safe exit conditions for a component or a program.
Starting from the END of the code, work backwards until you have identified all paths that lead to the exit of the code.
Assume that the exit condition is false.
Show that, for each path leading to the exit that the assignments made in that path contradict the assumption of an unsafe exit from the component.