Module 13: Incident Preparation and Investigation
Business Continuity Planning
- Business continuity is the ability of an organization to maintain operations and services during disruptive events.
- Business continuity planning (BCP) is creating a plan of action for disasters.
- The outcome is a business continuity plan (BCP), a strategic document providing alternative operation modes.
- A BCP should include:
- High availability
- Scalability
- Diversity
- On-prem and cloud considerations
- Continuity of Operation Planning (COOP) is a federal initiative for critical operations during negative circumstances.
- A Business Impact Analysis (BIA) identifies and quantifies the impact of losing business processes and functions.
- It determines the mission-essential function (core purpose of the enterprise).
- It identifies the single point of failure of a system.
- A Disaster Recovery Plan (DRP) is a subset of a BCP.
- It details restoring IT resources after significant service disruption.
- It covers the sequence in restoring systems (restoration order).
- Factors affecting restoration order: dependencies, process importance, alternative practices.
Incident Response Planning
- Steps for incident preparation:
- Create an incident plan
- Perform testing exercises
- Study attack frameworks
- An incident response plan is written instructions for reacting to security incidents and should contain:
- Definitions
- Incident response teams
- Reporting requirements
- Test the incident response plan through simulations to make adjustments.
- An information security framework defines policies and procedures for security control implementation and management.
- Frameworks such as MITRE ATT&CK and Diamond Model can be studied to understand how attacks occur.
Resilience Through Redundancy
- Capacity planning forecasts the need for future resources.
- It calculates future human resources (people capacity planning).
- It predicts the number of devices needed (technology capacity planning).
- It plans the size of the network (infrastructure capacity planning).
- Platform diversity: Using multiple different devices to host/serve an application or service.
- Equipment redundancy provides resilience in case of cyberattacks.
- Redundancy involves duplicated servers, drives, networks, power, sites, clouds, and data.
Servers
- High availability is needed in servers so that they are always accessible.
- Design network infrastructure so multiple servers appear as a single resource.
- Clustering combines two or more devices to appear as a single unit.
- A server cluster is the combination of two or more servers interconnected to appear as one.
- Two types of server clusters:
- Asymmetric: A standby server takes over upon failure.
- Symmetric: Every server performs useful work; remaining servers take over the failed server's tasks.
- Virtualization has impacted the number of server clusters needed for server redundancy.
Drives
- Hard Disk Drives (HDDs): store and retrieve data using spinning platters, actuator arms, and motors.
- Solid-State Drives (SSDs): store data on chips, more resistant to failure, and more reliable than HDDs.
- Some organizations maintain a stockpile of HDDs as spare parts to replace failures.
- Mean Time Between Failures (MTBF): the average time until a component fails or needs replacement.
RAID
- RAID (Redundant Array of Independent Drives) uses multiple hard drives for increased reliability and performance.
- Implemented via software (OS level) or hardware (specialized controller).
- Nested levels combine RAID levels (e.g., RAID 10).
- RAID levels:
- Level 0: Striping, increases performance but no redundancy.
- Level 1: Disk mirroring, provides redundancy by duplicating data.
- Level 5: Distributes parity data across all devices to offer redundancy.
SAN Multipath
- A Storage Area Network (SAN) is a dedicated network storage facility for high-speed data access.
- It consolidates storage facilities that appear as a single pool of devices.
- Multipath creates multiple physical paths between devices and a SAN.
- If one path fails, traffic is redirected to another path.
Networks
- A redundant network waits in the background and uses replication to keep its copy of the live network information current.
- A redundant network ensures that network services are always accessible.
- Switches and routers can have an active primary port and a standby failover port for physical redundancy.
- Load balancers can provide network redundancy by blocking traffic to non-functioning servers.
Power
- An Uninterruptible Power Supply (UPS) maintains power during interruptions.
- Off-line UPS: Begins supplying power upon interruption and switches back when power is restored.
- On-line UPS: Always runs off its battery, cleaning the electrical power.
Sites
- Hot site: Commercial disaster recovery service for continued computer and network operations.
- Cold site: Provides office space; customer provides and installs all equipment.
- Warm site: Has equipment installed but lacks active internet, telecommunications, and current data backups.
Clouds
- Cloud resilience considerations:
- Location of data.
- Using multiple cloud providers (multicloud systems).
- Multicloud advantage: Tolerates critical issues with a single cloud provider.
Data
- Data resilience is achieved by copying data.
- Recovery Point Objective (RPO): Maximum tolerable time between copies.
- Recovery Time Objective (RTO): The time it takes to recover data.
- Backup: A scheduled event where data is copied and stored for disaster recovery (on-site or off-site).
- Data replication techniques:
- Snapshot: Repeatedly captures the state of data for restoration from a specific point in time.
- Journaling: Copies data whenever a change occurs.
Incident Investigation
- Root-Cause Analysis (RCA) discovers the origin of a security event.
- Incident investigation involves analyzing data sources and performing digital forensics.
Data Sources
- A log is a record of events.
- Security logs reveal the type of attack and how it bypassed defenses.
- Log management problems:
- Multiple devices generate logs.
- Very large data volume.
- Different log formats.
- Data from vulnerability scans provides useful information.
- A SIEM dashboard provides information from its sensors (alerts, trends, sensitivity, correlation data).
- Network Internet Protocol (IP) monitors create automated activity reports.
- sFlow is a packet capture protocol that generates information based on capturing packets.
Digital Forensics
- Digital forensics is the application of science to legal questions.
- It involves retrieving hidden, altered, or deleted data.
- A digital forensics specialist searches for evidence of cybercrime or damage.
- E-discovery is the electronic equivalent of manual document review.
Forensics Procedures
- Secure the Scene: Contact a digital forensics incident response team to avoid contamination.
- Preserve the Evidence: Ensure proof is not corrupted; use tagged bags with descriptions, identifiers, date, and location.
- Document Chain of Custody: Documents that the evidence was always under strict control.
- Examine for Evidence: Use specialized tools to gather evidence (acquisition).
- Software tools can capture a system image or snapshot of current settings and data.
- Common Forensic Software Suites: EnCase and FTK Imager.
- Mobile Device Forensics Tools: designed for smartphones and tablets.
- Order of Volatility: Follow a specific order when examining devices.
- Generate a Report: Provide a detailed written description of evidence acquisition and analysis (reporting).
Order of Volatility
- Registers and CPU cache
- Routing tables, ARP cache, process table, kernel statistics, RAM
- Temporary file systems
- Hard drive
- Remote logging and monitoring data
- Physical configuration and network topology
- Archival media