Dependable Systems

  • System Dependability

    • Critical property for computer-based systems.

    • Reflects user trust and confidence in normal operation.

    • Includes reliability, availability, and security, which are interdependent.

  • Importance of Dependability

    • Failures can impact large populations and lead to economic loss.

    • Undependable systems are unreliable, unsafe or insecure and risk rejection by users.

    • High recovery costs associated with information loss.

  • Dependability Properties

    • Trustworthiness: The degree of confidence a user has that the system will operate as they expect.

    • No numerical expression for dependability; use relative terms (e.g., ‘ultra-dependable’).

  • Primary Dependability Properties

    • Availability: System operational probability.

    • Reliability: Correct service delivery probability.

    • Safety: Likelihood of harm to people/environments.

    • Security: Resilience against intrusions.

    • Resilience: Continuity in disruption (e.g., cyberattacks).

  • Achieving Dependability

    • Prevent accidental errors, perform effective validation, design fault tolerance, and guard against attacks.

    • Include recovery mechanisms.

  • Cost of Dependability

    • Costs can rise exponentially with increased dependability.

    • Higher costs from enhanced techniques and rigorous validation/testing requirements.

  • Regulation and Compliance

    • Governments regulate to ensure safety in critical systems.

    • Critical systems require external regulatory approval.

  • Redundancy and Diversity

    • Redundancy: Keep backup components for critical functionalities.

    • Diversity: Provide the same functionality in different ways in different components so that they will not fail in the same way.

  • Problems with Redundancy and Diversity

    • Increases the system complexity.

    • Increase the chances of error.

  • Formal Methods

    • Approaches to software dev that are based on mathematical representation and analysis of software.

    • Includes:

      • Formal specification

      • Spec analysis and proof

      • Transformational dev

      • Program verification

    • They significantly reduce some types of programming errors.

    • Mainly used in dependable systems engineering.

    • Involves investing more effort in the early phases of software dev.

  • Benefits of Formal Specification

    • System reqs are analysed in detail so helps to detect problems, inconsistencies and incompleteness.

    • Spec is expressed in formal language so it can be automatically analysed to discover inconsistencies and incompleteness.

    • Can transform the formal spec into a “correct” program.

    • Testing costs may be reduced if the program is formally verified against its spec.

  • Problems with Formal Methods

    • Not become mainstream and have limited impact on practical software dev.

    • Problem owners cannot understand a formal spec so can’t assess it.

    • Easy to assess the costs but hard to assess the benefits.

    • Scope is limited. Not well suited to UI.

    • Hard to scale up to large systems.

    • Not really compatible with agile dev methods.

  • Fault Management

    • Fault Avoidance: Development minimizes errors from the start.

    • Detection/Removal: Use verification and validation techniques to remove faults before deployment.

    • Fault Tolerance: Design systems to handle faults without the system failing.

  • Reliability

    • The probability of failure-free system operation over a specified time in a given environment for a given purpose.

  • Availability

    • The probability that a system, at a point in time, will be operational and able to deliver the requested services.

  • Dependable Programming Good Practice Guidelines

    • Limit the visibility of info in a program

    • Check all inputs for validity

    • Provide a handler for all exceptions

    • Provide restart capabilities

    • Check array bounds

    • Name all constants that represent real-world values

    • Include timeouts when calling external components

    • Minimize the use of error prone constructs