Module 13: Incident Preparation and Investigation

Business continuity is the ability of an organization to maintain operations and services during disruptive events.
Business continuity planning (BCP) is creating a plan of action for disasters.
- The outcome is a business continuity plan (BCP), a strategic document providing alternative operation modes.
A BCP should include:
- High availability
- Scalability
- Diversity
- On-prem and cloud considerations
Continuity of Operation Planning (COOP) is a federal initiative for critical operations during negative circumstances.
A Business Impact Analysis (BIA) identifies and quantifies the impact of losing business processes and functions.
- It determines the mission-essential function (core purpose of the enterprise).
- It identifies the single point of failure of a system.
A Disaster Recovery Plan (DRP) is a subset of a BCP.
- It details restoring IT resources after significant service disruption.
- It covers the sequence in restoring systems (restoration order).
- Factors affecting restoration order: dependencies, process importance, alternative practices.

Steps for incident preparation:
- Create an incident plan
- Perform testing exercises
- Study attack frameworks
An incident response plan is written instructions for reacting to security incidents and should contain:
- Definitions
- Incident response teams
- Reporting requirements
Test the incident response plan through simulations to make adjustments.
An information security framework defines policies and procedures for security control implementation and management.
- Frameworks such as MITRE ATT&CK and Diamond Model can be studied to understand how attacks occur.

Capacity planning forecasts the need for future resources.
- It calculates future human resources (people capacity planning).
- It predicts the number of devices needed (technology capacity planning).
- It plans the size of the network (infrastructure capacity planning).
Platform diversity: Using multiple different devices to host/serve an application or service.
Equipment redundancy provides resilience in case of cyberattacks.
Redundancy involves duplicated servers, drives, networks, power, sites, clouds, and data.

High availability is needed in servers so that they are always accessible.
Design network infrastructure so multiple servers appear as a single resource.
- Clustering combines two or more devices to appear as a single unit.
- A server cluster is the combination of two or more servers interconnected to appear as one.
Two types of server clusters:
- Asymmetric: A standby server takes over upon failure.
- Symmetric: Every server performs useful work; remaining servers take over the failed server's tasks.
Virtualization has impacted the number of server clusters needed for server redundancy.

Hard Disk Drives (HDDs): store and retrieve data using spinning platters, actuator arms, and motors.
Solid-State Drives (SSDs): store data on chips, more resistant to failure, and more reliable than HDDs.
Some organizations maintain a stockpile of HDDs as spare parts to replace failures.
Mean Time Between Failures (MTBF): the average time until a component fails or needs replacement.

RAID (Redundant Array of Independent Drives) uses multiple hard drives for increased reliability and performance.
- Implemented via software (OS level) or hardware (specialized controller).
- Nested levels combine RAID levels (e.g., RAID 10).
RAID levels:
- Level 0: Striping, increases performance but no redundancy.
- Level 1: Disk mirroring, provides redundancy by duplicating data.
- Level 5: Distributes parity data across all devices to offer redundancy.

A Storage Area Network (SAN) is a dedicated network storage facility for high-speed data access.
- It consolidates storage facilities that appear as a single pool of devices.
Multipath creates multiple physical paths between devices and a SAN.
- If one path fails, traffic is redirected to another path.

A redundant network waits in the background and uses replication to keep its copy of the live network information current.
- A redundant network ensures that network services are always accessible.
Switches and routers can have an active primary port and a standby failover port for physical redundancy.
Load balancers can provide network redundancy by blocking traffic to non-functioning servers.

An Uninterruptible Power Supply (UPS) maintains power during interruptions.
- Off-line UPS: Begins supplying power upon interruption and switches back when power is restored.
- On-line UPS: Always runs off its battery, cleaning the electrical power.

Hot site: Commercial disaster recovery service for continued computer and network operations.
Cold site: Provides office space; customer provides and installs all equipment.
Warm site: Has equipment installed but lacks active internet, telecommunications, and current data backups.

Cloud resilience considerations:
- Location of data.
- Using multiple cloud providers (multicloud systems).
Multicloud advantage: Tolerates critical issues with a single cloud provider.

Data resilience is achieved by copying data.
Recovery Point Objective (RPO): Maximum tolerable time between copies.
Recovery Time Objective (RTO): The time it takes to recover data.
Backup: A scheduled event where data is copied and stored for disaster recovery (on-site or off-site).
Data replication techniques:
- Snapshot: Repeatedly captures the state of data for restoration from a specific point in time.
- Journaling: Copies data whenever a change occurs.

Root-Cause Analysis (RCA) discovers the origin of a security event.
Incident investigation involves analyzing data sources and performing digital forensics.

A log is a record of events.
Security logs reveal the type of attack and how it bypassed defenses.
Log management problems:
- Multiple devices generate logs.
- Very large data volume.
- Different log formats.
Data from vulnerability scans provides useful information.
A SIEM dashboard provides information from its sensors (alerts, trends, sensitivity, correlation data).
Network Internet Protocol (IP) monitors create automated activity reports.
- sFlow is a packet capture protocol that generates information based on capturing packets.

Secure the Scene: Contact a digital forensics incident response team to avoid contamination.
Preserve the Evidence: Ensure proof is not corrupted; use tagged bags with descriptions, identifiers, date, and location.
Document Chain of Custody: Documents that the evidence was always under strict control.
Examine for Evidence: Use specialized tools to gather evidence (acquisition).
- Software tools can capture a system image or snapshot of current settings and data.
- Common Forensic Software Suites: EnCase and FTK Imager.
- Mobile Device Forensics Tools: designed for smartphones and tablets.
- Order of Volatility: Follow a specific order when examining devices.
Generate a Report: Provide a detailed written description of evidence acquisition and analysis (reporting).