AB

Chapter 10 - Slide Deck Updated Part 1 (1)

Contingency Planning

Agenda

  • Contingency Planning
  • Business Impact Analysis
  • Incident Response Planning
  • Disaster Recovery Planning
  • Business Continuity Planning

Fundamentals of Contingency Planning

  • What Is Contingency Planning (CP)?
    • CP is the organizational process of preparing for unexpected adverse events.
    • These events (sometimes called incident candidates) can:
      • Threaten information assets
      • Disrupt business operations
    • CP covers incident response, disaster recovery, business continuity, and business impact analysis (BIA).
    • Goal: Restore operations with minimal disruption and cost.
    • Example: A server room flood threatens systems; CP ensures continuity via cloud failover.

Key Components of CP

  • Business Impact Analysis (BIA):
    • Identifies mission-critical functions and systems.
    • Helps prioritize recovery efforts.
  • Incident Response Plan (IRP):
    • Defines steps to respond to immediate incidents (e.g., malware outbreak).
  • Disaster Recovery Plan (DRP):
    • Focuses on restoring systems at the primary site after a major event.
  • Business Continuity Plan (BCP):
    • Enables continued operations at an alternate site if primary recovery fails.
  • Example Flow: Phishing attack → IRP triggered → Data loss → DRP initiated → Prolonged outage → BCP activated.

Unified vs. Modular Planning Approaches

  • Unified Plan:
    • Common in smaller organizations.
    • Simple, integrated recovery strategies.
  • Modular Plans:
    • Preferred by large, complex organizations.
    • Separate but interlinked IR, DR, BC, and BIA plans.
  • Choice depends on resources, complexity, and philosophy.

CP Planning Prerequisites

  • Planning methodology: Clear, repeatable process.
  • Policy environment: Management support and documented authority.
  • Budget and resources: Financial and technical support.
  • Business Impact Analysis: Identifies what must be protected and recovered.

NIST's 7-Step CP Process (SP 800-34 Rev. 1)

  1. Develop CP Policy:
    • Provides the authority and guidance necessary to develop an effective contingency plan
  2. Conduct BIA:
    • Identify/prioritize critical systems and functions.
  3. Identify Preventive Controls:
    • What are the measures taken to reduce the effects of system disruptions.
    • This can increase system availability and reduce contingency life cycle costs.
  4. Create Contingency Strategies:
    • Define detailed recovery methods to ensure quick and effective recovery following a disruption.
  5. Develop the CP:
    • Detailed guidance and procedures for restoration unique to each business unit.
  6. Test, Train, Exercise:
    • Ensures readiness and uncover gaps.
  7. Maintain the Plan:
    • Update regularly to reflect organizational changes.
    • Example: A tested plan reveals a communication gap—training resolves it.

Contingency Planning Life Cycle

The contingency planning life cycle includes the following steps:

  1. Form the CP team.
  2. Develop the CP policy statement.
  3. Develop subordinate planning policies (IR/DR/BC).
  4. Form subordinate planning teams (IR/DR/BC).
  5. Conduct the business impact analysis (BIA).
  6. Integrate the business impact analysis (BIA).
  7. Identify preventive controls.
  8. Determine mission/business processes &recovery criticality.
  9. Identify resource requirements.
  10. Identify recovery priorities for system resources.
  11. Create response strategies (IR/DR/BC).
  12. Develop subordinate plans (IR/DR/BC).
  13. Organize response teams (IR/DR/BC).
  14. Ensure plan testing, training, and exercises.
  15. Ensure plan maintenance.
  16. Review/revise as needed
  • Continuous improvement

Importance of CP Policy

  • Articulates executive intent and strategic importance.
  • Defines:
    • Scope and purpose
    • Team responsibilities
    • Risk assessment/BIA frequency
    • Testing and maintenance cycles
  • Assigns Roles: COO (CPMT lead), CISO (IR lead), legal, IT, operations.

Specialized Planning Teams

  • IRPT: Designs/manages incident response procedures.
  • DRPT: Handles disaster recovery planning and processes.
  • BCPT: Plans for alternate site operations continuity.
  • CMPT: Develops crisis management strategy.
  • Teams may overlap in small orgs, but ideally are distinct to avoid role conflict.

Staffing Considerations

  • Planning teams ≠ response teams (but include some overlap for continuity).
  • Avoid role conflicts during real incidents.
  • Example: A DR team member can’t also manage BC tasks at a different site simultaneously.

Common Pitfalls

  • Many orgs undervalue CP, leading to:
    • Delayed recovery
    • Permanent data loss
    • Business failure
  • Lack of testing = false sense of preparedness.
  • CP must be a high priority, not an afterthought.

Business Impact Analysis

BIA Planning Considerations

  • Scope –
    • Determine:
      • Which business units to cover
      • Which systems to include
      • The nature of the risk being evaluated.
  • Plan –
    • Make sure the proper data is collected to enable a comprehensive analysis
    • Getting the correct information to address the needs of decision makers is important.
  • Balance between Objective vs Subjective information–
    • You may have collected huge amount of data, weigh the information available;
    • Some information may be objective in nature, and some are as subjective or anecdotal references.
    • Facts should be weighted properly against opinions;
    • However, Sometimes the knowledge and experience of key personnel can be invaluable.
  • Objective – Tailor analysis to decision-maker needs.
  • Follow-Up –
    • Communicate periodically to ensure process owners and decision makers will support the process and the end result of the BIA

Three BIA Phases (Per NIST SP 800-34 Rev. 1)

  1. Determine Mission/Business Processes and Recovery Criticality
  2. Identify Resource Requirements
  3. Prioritize System Resources for Recovery

Step 1 – Determine Business Process Criticality

  • Assess each unit’s function and value to core operations.
  • Evaluate how failure would impact mission success.
  • Use Weighted Table Analysis (WTA) to:
    • Define criteria (e.g., revenue impact, compliance, customer service).
    • Assign weights to each criterion.
    • Score functions and compute importance.
  • Example: Sales platform vs. HR database—restore sales first due to revenue generation.
  • Use of BIA Questionnaires
    • Collects consistent data across departments.
    • Includes:
      • Process descriptions
      • Dependencies
      • Impact assessments
    • Can be filled out by functional managers.
  • Resources for Templates:
    • NIST: SP 800-34 Rev. 1 BIA Template
    • FEMA, Ready.gov, DRJ

Step 2 – Identify Recovery Objectives & Timeframes

  • Use NIST’s recovery metrics:
    • RTO (Recovery Time Objective):
      • The maximum amount of time that a system resource can remain unavailable before there is an unacceptable impact of other system resources.
    • RPO (Recovery Point Objective):
      • The point in time before a disruption or system outage to which business process data can be recovered after an outage. Given the most recent back up copy of the data
    • MTD (Maximum Tolerable Downtime):
      • The total amount of time the system owner or authorizing official is willing to accept outage or disruption. The MTD includes all impact considerations.
    • WRT (Work Recovery Time):
      • The amount of time needed to make business functions work again after the technology element is recovered. This recovery time is identified by the RTO.
  • Example:
    • RTO = 2 hrs, RPO = 10 minutes, WRT = 4 hrs, MTD = 6 hrs total to be fully functional again
  • Cost-Benefit Analysis of Recovery Times
    • Shorter RTO = Higher cost
    • Longer downtime = Higher operational loss
  • Graph (Figure 10-5): Balance between:
    • Cost to recover (mirror site, backups)
    • Cost of disruption (lost revenue, reputation)
  • Example: Web retail site may require real-time recovery = costly mirrored site.

Step 3 – Identify & Classify Information Assets

  • Classify data into Critical, Very Important, Routine, etc.
  • Helps prioritize what must be restored first.
  • Example: Payment gateway database gets higher priority than training schedules.

Step 4 – Identify Recovery Resource Requirements

  • List supporting assets for each business process.
  • Use a Resource/Component Table to document:
    • Hardware/software needs
    • Network dependencies
    • Staff roles
  • Example Table Entry:
    • Process: Customer Billing
    • Resource: Accounts Receivable Application
    • Details: Linux server + SQL database, ~$8,000

Step 5 – Prioritize System Resource Recovery

  • Use weighted scoring or simple labels (Primary/Secondary).
  • Avoid overcomplicating the process:
    • Large orgs → detailed weighted analysis.
    • Smaller orgs → fast classification scheme.
  • Output: Custom “to-do” list for disaster and continuity planning teams.

Incident Response Plan

  • Incident Response (IR) refers to a planned, coordinated approach to:
    • Detecting
    • Reacting to
    • Recovering from information security incidents.
  • Many organizations already engage in IR (e.g., reacting to a system crash), even if informally.
  • IR is essential to maintain system integrity, business continuity, and data protection.
  • Example: Employee accidentally deletes a key database—IT acts to restore from backup and prevent recurrence.

What is an Incident?

  • Adverse Event: Any unexpected event that could harm information assets.
  • Incident: An adverse event that materializes into a real threat.
  • Not every adverse event becomes an incident—but all incidents begin as adverse events.
  • Example: Unusual login attempts → Adverse event. Confirmed unauthorized access → Incident.

Incident Response vs. Incident Reaction

  • IR refers to the entire planning and coordination process.
  • Reaction refers to what the organization actually does after detecting an incident.
  • This deck uses:
    • IR = broad process (planning + action).
    • Reaction = specific post-detection response.
  • Example: The IR team prepares a phishing response plan → Actual blocking/removal of phishing email = reaction.

Incident Response Planning (IRP)

  • IRP is the formal preparation effort to guide organizational IR activities.
  • Includes development of:
    • IR policies
    • IR plan (IRP document)
    • Formation of IRP Team (IRPT)
  • Requires senior management support and cross-functional coordination.
  • IRPT = specialized team trained to handle incidents as per documented plan.

Who is Involved?

  • IRP Team (IRPT):
    • Often includes members from IT, InfoSec, legal, and communications.
    • Coordinates detection, containment, eradication, and recovery steps.
  • Management: Provides oversight and resources.
  • End users: Often the first line of detection (e.g., reporting suspicious email or activity).

When is the IR Plan Activated?

  • Trigger point: When an incident is detected, no matter how minor.
  • Early detection and response can limit damage and cost.
  • Example: If a USB drive containing sensitive data is reported missing, the IR plan is activated to assess and mitigate data loss.

Getting Started with Incident Response

Early Task: Form the IRPT

  • Initiated by: Contingency Planning Management Team (CPMT).
  • The Incident Response Planning Team (IRPT) is responsible for:
    • Creating and documenting the incident response policy.
    • Defining the scope and structure of incident handling.
    • Outlining the organization’s response approach to different types of incidents.
    • Guiding users on how to help, not hinder, response efforts.
    • Do: Report anomalies promptly.
    • Don’t: Attempt personal troubleshooting that may destroy evidence

IRPT’s Core Responsibilities

  • Establish formal policy for incident response activities.
  • Define incident categories and related protocols.
  • Advise users on how to recognize and report potential incidents.
  • Design training programs and communication protocols.
  • Ensure IR plan aligns with business objectives and legal requirements.

Forming the CSIRT (Computer Security Incident Response Team)

  • Formed by IRPT to implement and execute the IR plan.
  • Comprised of:
    • Technical IT staff (admins, network engineers)
    • Managerial IT personnel (e.g., CIO or IT directors)
    • InfoSec specialists (e.g., security analysts, threat hunters)
  • Some IRPT members may overlap with the CSIRT for continuity and clarity.

CSIRT in Action – A Practical Example

  • Scenario: A phishing campaign targets internal users.
  • Detection: Alert from email filter → triaged by CSIRT.
  • Containment: Block malicious sender and quarantine affected inboxes.
  • Recovery: Scan affected devices, restore clean backups if needed.
  • User Guidance: IRPT trains staff to avoid clicking suspicious links in the future.

What’s Next – The NIST Incident Response Lifecycle

  • The CSIRT operates within a structured lifecycle defined by NIST SP 800-61 Rev. 2:
    • Preparation
    • Detection & Analysis
    • Containment, Eradication, and Recovery
    • Post-Incident Activity
  • This model ensures a systematic and repeatable response process.

Introduction to the NIST Cybersecurity Framework (CSF)

  • Developed by NIST to enhance critical infrastructure cybersecurity.
  • Known as the Framework for Improving Critical Infrastructure Cybersecurity.
  • Designed to complement existing IR standards, including:
    • NIST SP 800-61 Rev. 2 (Incident Handling Guide)
    • NIST SP 800-184 (Cybersecurity Event Recovery)
  • Built on foundational practices outlined in earlier NIST Special Publications.

Mapping CSF to IR and Recovery

  • The CSF includes five core functions, which closely align with the IR lifecycle:
    • Identify → Supports risk management and governance.
    • Protect → Emphasizes controls: policy, training, technology.
    • Detect → Involves recognizing signs of security incidents.
    • Respond → Concerns action taken once an incident is detected.
    • Recover → Focuses on restoring systems and operations.

CSF Function 1 – Identify

  • Supports:
    • Asset management
    • Risk assessments
    • Governance programs
  • Key Objective: Understand what needs protection and why.
  • Example: Inventorying all data centers and defining their importance to operations.

CSF Function 2 – Protect

  • Implementation of preventive measures, such as:
    • Access controls
    • Security awareness training
    • Data protection tools
  • Builds a defense-in-depth strategy.
  • Example: Requiring MFA for system logins to prevent unauthorized access.

CSF Function 3 – Detect

  • Focuses on real-time monitoring and alerting.
  • Enables organizations to identify incidents as they occur.
  • Utilizes:
    • IDS/IPS systems
    • SIEM tools
    • Threat intelligence feeds
  • Example: SIEM detects unusual login patterns indicating a possible breach.

CSF Function 4 – Respond

  • Encompasses:
    • Incident handling
    • Communication plans
    • Legal coordination
    • Forensics and containment
  • Aligned with: NIST SP 800-61 Rev. 2
  • Example: After detecting ransomware, CSIRT isolates infected machines and begins recovery.

CSF Function 5 – Recover

  • Objective: Restore services and reduce long-term impact.
  • Informed by:
    • NIST SP 800-184: Guide for Cybersecurity Event Recovery
  • Includes:
    • System restoration
    • Lessons learned
    • Improvement planning
  • Example: Restoring clean backups and implementing safeguards to prevent repeat incidents.

Introduction to the IR Policy

Key Components of the IR Policy (NIST SP 800-61 Rev. 2)

  • Statement of Management Commitment
    • Confirms executive support and assigns authority to the CSIRT.
    • Ensures organizational alignment with IR goals.
  • Purpose and Objectives
    • Clarifies why the policy exists.
    • Describes what the IR process aims to achieve (e.g., minimize downtime, preserve evidence).
  • Scope
    • Specifies:
      • Who is covered (e.g., employees, contractors).
      • What is covered (systems, data, networks).
      • When it applies (during and after incidents).

Key Components of the IR Policy (Continued)

  • Definitions of InfoSec Incidents and Related Terms
    • Standardizes language for clarity.
    • Example: Clearly distinguish between “incident,” “event,” and “breach.”
  • Organizational Structure and Responsibilities
    • Roles and authority levels (e.g., CSIRT’s ability to disconnect systems).
    • Requirements for:
      • Reporting incidents.
      • Monitoring activity.
      • Interacting with external stakeholders (e.g., law enforcement, partners).
    • Example: Policy may authorize the CSIRT to pull a compromised server offline immediately without waiting for higher approval.

Additional Policy Elements

  • Incident Severity and Prioritization
    • Defines how incidents are ranked by impact (e.g., critical, high, medium, low).
    • Helps triage response efforts and allocate resources appropriately.
  • Performance Measures
    • Metrics to evaluate IR effectiveness (e.g., time to detect, contain, recover).
    • Aligns with broader InfoSec performance measurement frameworks (see Chapter 9).
  • Reporting and Contact Protocols
    • Specifies:
      • How to report an incident.
      • What forms or systems to use.
      • Who should be contacted (internal and external).

What Is Incident Response Planning (IRP)?

  • IRP is the structured development of plans, policies, and teams to manage InfoSec incidents.
  • It is a reactive process: activated after an incident is detected, not before.
  • Falls under the responsibility of:
    • CIO, CISO, or designated IT manager
    • With support from CPMT, system administrators, and key stakeholders
  • Example: Water damage to an office activates IRP, not DRP, unless broader infrastructure is impacted.

What Qualifies as an InfoSec Incident?

  • An event is classified as an incident if it meets all the following:
    • Targets information assets
    • Has a realistic chance of success
    • Threatens confidentiality, integrity, or availability (CIA)
  • IR focuses on incidents, not on prevention—that's InfoSec’s job.

IR Plan Elements (NIST SP 800-61 Rev. 2)

  • A comprehensive IR plan should include:
    • Mission: Purpose of the response effort.
    • Strategies and goals: Desired outcomes.
    • Senior management approval: Legitimacy and authority.
    • Organizational approach: Structure and responsibilities.
    • Communication plans: Internal and external.
    • Performance metrics: KPIs to track effectiveness.
    • Capability roadmap: For continuous improvement.
    • Integration: How it fits within the broader organization.

Three Sets of Incident Response Procedures

  • Before the Incident (Preparation):
    • Backups, training, SLAs, test plans, DR/BC links
    • Example: Weekly data backup plan with off-site storage
  • During the Incident:
    • Real-time actions by technical and managerial teams
    • Assigned by role (admin vs. comms vs. legal)
  • After the Incident:
    • Cleanup, recovery, and lessons learned
    • Restoring systems and documentation of events

Role of the CSIRT in IRP Execution

  • The CSIRT executes the IR plan:
    • Detect, respond to, and recover from incidents
    • May be formal or informal based on org size
  • Acts like a firefighting unit:
    • Each member knows their specific role
    • Coordinates as a unified team
  • Example: One team isolates affected systems while another handles comms and compliance.

IR Phases – Detect, React, Recover

  • Detection:
    • Recognize that an incident is underway
  • Reaction:
    • Contain and mitigate damage
    • Aligns with "Respond" in the NIST CSF
  • Recovery:
    • Return systems to pre-incident condition

Incident Handling Checklist (NIST SP 800- 61, Rev. 2)

  • Detection and Analysis
    1. Determine incident occurrence
    2. Analyze precursors/indicators
    3. Correlate and research
    4. Document and gather evidence
    5. Prioritize based on impact
    6. Report internally/externally
  • Containment, Eradication, and Recovery
    1. Preserve/document evidence
    2. Contain and eradicate
    3. Recover systems and monitor
  • Post-Incident Activity
    1. Create report
    2. Conduct lessons learned session (mandatory for major events)

Data Protection Strategies for IR Preparation

  • Traditional backups:
    • On-site/off-site, disk-to-disk-to-tape, RAID
  • Electronic vaulting:
    • Batch data transfers via secure lines
  • Remote journaling:
    • Real-time transaction replication (vs. full backups)
  • Database shadowing:
    • Real-time mirroring to two locations
  • 3-2-1 Rule:
    • 3 copies of data, 2 media types, 1 off-site
    • Example: Daily on-site backups, weekly cloud storage

Incident Detection

  • Goal: Distinguish between routine system activity and real incidents.
  • This is the first phase in the NIST incident response lifecycle.
  • Begins with incident classification—deciding if an adverse event is an actual incident.
  • Sources for detection:
    • End-user reports
    • Intrusion Detection/Prevention Systems (IDPS)
    • Antivirus/antimalware alerts
    • Admin observations
  • Example: A help desk user notices strange pop-ups → report to CSIRT → IR plan initiated.

What Is Incident Classification?

  • Definition: The process of evaluating an adverse event to determine whether it qualifies as an InfoSec incident.
  • Requires training, clear definitions, and consistent procedures.
  • Classification is critical for deciding the response path (IR vs. DR/BC).

Three Types of Incident Indicators

  • Possible Indicators
    • May suggest an incident but need further investigation.
    • Examples:
      • Unfamiliar files found by users or admins
      • Unknown processes running in the background
      • Unusual system crashes or sudden reboots
      • Resource spikes or drops (e.g., CPU, RAM, disk space)
    • Tools: Windows Task Manager, UNIX/Linux resource monitors
  • Probable Indicators
    • Stronger evidence of malicious or abnormal activity.
    • Examples:
      • System activity at odd hours (e.g., midnight traffic spikes)
      • New user accounts with no documentation
      • User-reported attacks
      • Alerts from IDPS (though these may include false positives)
  • Definite Indicators
    • Confirmed signs that an incident is happening or has occurred.
    • Examples:
      • Dormant accounts being used unexpectedly
      • Log file modifications with no authorized changes
      • Presence of hacker or penetration tools in unauthorized locations
      • Notification by trusted partner or external organization
      • Web defacement or extortion message from a hacker

Incidence Detection Results That May Indicate an Incident

  • Treat all unusual results as potential incidents—better to overreact than ignore.
  • Possible outcomes of actual or attempted incidents:
    • Loss of availability (system crash or downtime)
    • Loss of integrity (corrupted or altered data)
    • Loss of confidentiality (data leak or unauthorized access)
    • Violation of policy (e.g., unapproved file sharing)
    • Violation of law/regulation (e.g., unauthorized access to protected health info)

Why Early Detection Matters

  • Prevents small issues from becoming large-scale incidents or disasters.
  • Helps the IR team:
    • Activate predefined IR procedures
    • Contain and analyze the situation quickly
  • Reduces:
    • Downtime
    • Reputational harm
    • Financial loss

From Detection to Reaction

  • Once an incident is confirmed and classified, the IR plan moves from Detection to Reaction.
  • NIST SP 800-61, Rev. 2 combines this with recovery into “Containment, Eradication, and Recovery”, while the NIST CSF separates them as “Respond” and “Recover.”
  • The Response Phase aims to:
    • Stop the incident
    • Minimize its impact
    • Prepare for recovery

Notification of Key Personnel

  • The CSIRT activates the alert roster to notify the appropriate individuals.
  • Two types of rosters:
    • Sequential: One person contacts everyone (accurate but slow).
    • Hierarchical: Each person calls others in a tree structure (faster but risk of miscommunication).
  • Tools: Automated systems (e.g., Preparis Portal) can streamline communication.
  • Example: A ransomware attack triggers automated SMS, email, and voice alerts to CSIRT and IT leadership.

Alert Messages and Communication

  • Alert messages include just enough information so responders can act without delay.
  • Alert message example: “Ransomware detected on finance server. Disconnect server and contact network admin. Follow IR SOP.”
  • Alert rosters must be:
    • Regularly updated
    • Tested
    • Rehearsed
  • General management, legal, HR, comms, and external partners may also need to be notified depending on the incident.

Documenting the Incident

  • Document:
    • Who did what
    • What happened
    • When actions were taken
    • Where the incident occurred
    • Why/How the event unfolded
  • Purpose:
    • Enables case studies
    • Aids in legal defense
    • Supports training and simulation
  • Example: Documentation helps prove compliance with due care in a breach affecting customer data.

Containment Strategies – Stopping the Attack

  • Identify affected systems quickly, but without full forensics analysis.
  • Common containment methods:
    • Disable compromised user accounts
    • Reconfigure firewalls
    • Shut down affected apps/services (e.g., mail server)
    • Disconnect infected network segments
    • In extreme cases, power down all systems
  • Example: Email phishing attack contained by disabling external email gateway temporarily.

Balancing Containment vs. Operations

  • Not all containment steps are ideal:
    • Disconnecting circuits may stop the attack but also halt business.
  • Use adaptive methods:
    • Apply IP filtering
    • Block specific ports or traffic types
    • Monitor activity while developing longer-term solutions

Preparedness Across the Organization

  • Preparedness must go beyond the CISO and CSIRT.
  • Why?
    • Team members may be sick, traveling, or otherwise unavailable.
  • Everyone should:
    • Know basic IR steps
    • Understand their role in an emergency
  • Example: Receptionist reporting suspicious USB drives helps avoid data exfiltration.

Incident Escalation – When IR Isn’t Enough

  • Some incidents escalate beyond the IR plan’s scope:
    • Infrastructure-wide damage
    • Physical destruction
    • Extended outages
  • Criteria for escalation should be:
    • Defined during the BIA
    • Documented in the IR plan
  • Escalation triggers:
    • Major financial or operational impact
    • Need for law enforcement or emergency services
  • Example: A DDoS attack affecting critical customer services may escalate to disaster recovery activation and regulatory reporting.

Transitioning from Response to Recovery

  • Recovery phase begins after containment and regaining system control.
  • The focus shifts from mitigation to restoration.
  • Recovery includes:
    • Restoring systems and data
    • Rebuilding trust
    • Preventing recurrence

Initial Recovery Tasks

  • Notify appropriate personnel:
    • IT operations, system owners, data custodians, and department heads.
  • Begin damage assessment immediately:
    • Evaluate extent of loss to confidentiality, integrity, and availability (CIA).
  • Use:
    • Incident documentation
    • Logs (IDS, system, config)
    • Backup records
  • Example: If an HR server was breached, assess if PII was exfiltrated or altered.

Incident Damage Assessment

  • A critical first step in recovery.
  • Can take days or weeks depending on incident scale.
  • Outputs:
    • Scope of infected/affected systems
    • Type of data loss or corruption
    • Entry vectors and spread pattern
  • Damage documentation must be handled with care—may be used in legal or civil proceedings.

Steps in the Recovery Process (Donald Pipkin Framework)

  1. Identify vulnerabilities that enabled the incident and remediate them.
    • Example: Apply patches, close open ports, disable unused services.
  2. Repair or install safeguards that failed or were missing.
    • Firewalls, endpoint detection, email filtering, MFA, etc.
  3. Evaluate and upgrade monitoring tools.
    • Deploy or enhance SIEM, intrusion detection, and alerting capabilities.

Data and Service Restoration

  • Restore data from backups using recovery processes:
    • Full backup + incremental backups
    • Database journals or remote journaling
  • Example: Use last clean backup from Monday, apply logs from Tues-Wed.
  • Reinstate services and processes:
    • Validate and restart suspended or compromised systems
    • Review service dependencies and integrity checks
  • Monitor systems continuously:
    • Post-recovery surveillance to catch repeat attacks or missed threats
    • Use enhanced logging, alerts, and analytics

Restoring Organizational Confidence

  • Communicate with stakeholders:
    • Internal: Staff, management, users
    • External (if necessary): Clients, partners, regulators
  • Tailor the message:
    • Minor event: Quick update, emphasize prevention
    • Major breach: Detailed reassurance, timeline for full restoration
  • Goal: Prevent confusion, panic, or loss of trust

Post-Incident Review – After-Action Review (AAR)

  • After-Action Review (AAR):
    • A structured and non-blaming discussion after an incident.
    • Allows all participants to share perspectives and reflect on:
      • What happened
      • What worked
      • What needs improvement
    • Moderated by a facilitator; results are documented and shared.
  • Purpose:
    • Refine the IR plan
    • Train new team members
    • Preserve institutional knowledge

What Happens in an AAR?

  • Each team member:
    • Reviews their roles and actions
    • Identifies gaps and strengths
  • The team verifies:
    • The accuracy of documentation
    • That incident records are clear and complete
  • Outcome:
    • Updates to IR plans, SOPs, and training content
    • May become a case study for future simulation

Common IR Mistakes (McAfee’s Top 10)

  1. No clear chain of command
  2. No central operations center
  3. Not understanding attacker tactics ("know your enemy")
  4. Missing or weak containment strategies
  5. Not recording IR activities across all stages
  6. Missing real-time documentation and timelines
  7. Confusing containment with remediation
  8. Inadequate network security and monitoring
  9. Weak or nonexistent system logging
  10. Poor or missing antivirus/antimalware coverage

NIST SP 800-61 Rev. 2 – Best Practices Summary

  • Acquire tools/resources ahead of time:
    • Contact lists, forensics tools, network diagrams
  • Prevent incidents with sound security controls and awareness
  • Use layered detection tools:
    • IDS/IPS, antivirus, file integrity tools
  • Let outsiders report incidents (publish contact methods)
  • Establish baseline logging and auditing:
    • More detailed on critical systems

NIST Recommendations (Cont.)

  • Profile network/system behavior to detect anomalies
  • Understand normal vs. abnormal activity
  • Develop