NatCat Pricing in Retail Notes

What is NatCat Pricing in Retail?

In retail insurance (e.g., home insurance), NatCat pricing refers to how insurers calculate premiums that account for the risk of natural perils, such as floods, storms, hail, wildfires, and earthquakes.
Unlike typical claims that arise from isolated incidents (like burst pipes), NatCat events affect many policies simultaneously, creating significant "accumulation risk".
The pricing process balances historical claim data (Own Experience) and simulated catastrophe models (CAT models) to estimate Expected Ultimate Loss (EUL) and derive a fair premium.

Core Components of the NatCat Pricing Framework

Own Experience (OE) Data
Precise Geolocation and GIS Enrichment
Catastrophe Models (CAT Models)
Statistical Modeling (GLMs)
Level Setting (Blending OE with CAT)
Calibration and Final Premium Derivation

Full Walkthrough with Worked Example

Let’s say we are SafeHome Insurers, a retail insurer pricing windstorm risk for residential buildings in a UK coastal region.

PHASE 1: Build the Data Foundation

Step 1: Data Collection & Geocoding

Data Type	Examples
Policy Data	Address, Sum Insured, Roof Type, Construction Year
Claims Data	Claim Date, Loss Amount, Location, Peril Type

Geocode everything to rooftop level using high-precision geocoders.
Why? For wind/flood perils, 10 meters can mean the difference between high damage and none.

Step 2: Analyze Historical Claims (Own Experience)

Inflate claims to a common base year (e.g., 2024) to account for inflation in repair costs.
Use CatDash (or internal BI tools) to:
- Visualize claim clusters by year and geography
- Identify unusual claims (e.g., "wind" that might be "hail")
Apply ST-DBSCAN (CatClustering):
- Clusters historical claims into discrete "events" (e.g., Storm Eunice 2022)
- Assign event footprints and characteristics
Example: SafeHome identifies 3 major windstorm clusters over 10 years.

PHASE 2: Add Geo-Spatial Insight & Forward-Looking Risk

Step 3: GIS Enrichment

SafeHome adds external variables using (X,Y) coordinates:

Variable	Type	Description
Distance to coast	Vector	Higher wind risk close to sea
Wind speed return period	Raster	ClimEX/ERA5 estimate
Elevation	Raster	Low elevation = more risk in floods

Pre-select variables based on:
- Correlation (remove redundant ones)
- Consistency across regions (avoid proxies for geography)

Step 4: CAT Model Simulation (Forward-Looking View)

Feed policy-level data into a vendor CAT model (e.g., RMS, AIR).
4 Model Components:
1. Hazard: Simulates thousands of windstorm events and their intensities.
2. Exposure: Matches these events to policy data.
3. Vulnerability: Calculates physical damage based on building type and wind speed.
4. Financial: Applies deductibles, limits, and terms to estimate financial loss.
Output: An Annual Exceedance Probability (AEP) curve, showing probabilities of portfolio losses being exceeded in a year.
Example (simplified AEP output):

Return Period (years)	Loss (£M)
2	2.5
10	7.0
50	20.0
100	35.0

PHASE 3: Pricing the Risk

Step 5: Level Setting (Blending OE & CAT Model)

Problem: Historical OE underrepresents rare tail events (like 1-in-100-year storms).
Solution: Blending in LSapp
- OE captures frequent, small events well
- CAT captures rare, extreme events
Create a blended loss distribution using both.
Blending Output: If OE gives £6M average annual loss, and CAT AEP gives £9M, blending might yield £7.5M EUL.
Calibration Factor:
$Calibration Factor = \frac{\text{Blended EUL}}{\text{GLM-predicted EUL}} = \frac{7.5}{6} = 1.25$

Step 6: Generalized Linear Modeling (GLM)

GLMs estimate how each feature (e.g., roof type, distance to coast) affects risk.
Build models for:
- Frequency (how often claims happen)
- Severity (how large claims are)
Inputs:
- Cleaned, enriched data
- Pre-selected GIS variables
- Flags for extreme events (optional exclusion)
Output: $RP_{GLM} = £50 \text{ per policy} \quad (\text{average output before scaling})$
Final EUL Formula (Per Policy):
$EUL = \text{RP}_{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{NatCat Infl}$
Say:
- RP_GLM = £50
- Calibration Factor = 1.25
- Inflation = 1.03 (3% projected inflation)
- ULAE = 1.02 (2% admin load)
- NatCat Infl = 1.00 (no expected increase)
Then:
$EUL = 50 \times 1.25 \times 1.03 \times 1.02 \times 1.00 = £65.8$
This becomes the risk premium (before profit margin, commissions, taxes).

PHASE 4: Deployment & Monitoring

Deploy premium in rating engine
Monitor performance vs. actuals
Revisit GIS enrichments, claims data, and CAT model assumptions annually

Summary Table of NatCat Pricing Pipeline

Phase	Step	Tool/Technique	Output
1	Data Prep	Geocoding, Inflation	Cleaned policy & claims dataset
1	Historical Loss Analysis	CatDash, ST-DBSCAN	Clustered historical events
2	Geo Enrichment	GIS tools, raster/vector data	Spatial rating factors
2	CAT Modeling	RMS/AIR model	AEP Curve
3	Blending	LSapp	Blended EUL, Calibration Factor
3	GLM Modeling	Emblem / R / Python	Base RP per policy
3	Final Pricing	EUL formula	Final technical price
4	Monitoring	BI dashboards, claims review	Feedback loop

Key Takeaways

Precision geocoding underpins everything. Garbage in, garbage out.
CAT models are essential for rare event tail risks.
Historical data alone is not enough — it misses tail risk.
Blending (Level Setting) combines both worlds into a realistic premium.
GIS enrichment adds predictive power to the pricing model.
Calibration aligns technical model outputs with long-term expectations.

Final Notes: Blending Historical Claims with CAT Model Outputs in NatCat Pricing

Context: Why Blending Is Needed

In Natural Catastrophe (NatCat) pricing, insurers need to estimate the Expected Ultimate Loss (EUL) — the average annual loss from a peril like windstorm, flood, or hail.
To do this accurately, we combine two fundamentally different sources:

Source	Captures	Weakness
Historical Claims (Own Experience)	Frequent, low-severity events (attritional)	Doesn’t capture rare catastrophic events
CAT Model Output	Rare, extreme events (tail risk)	Often unreliable for small, frequent losses

The solution is to blend both data sources into a single exceedance probability (EP) curve, then calculate the EUL as the area under that curve.

Step-by-Step Process

1. Prepare Historical Claims Data (Own Experience)

a. Clean and Enrich Data
- Ensure claims and exposures are geocoded.
- Inflate claims and Sum Insured (SI) to a common base year.
b. Aggregate Annually
- Group claims by accident year.
- Sum claims and SI per year.
- Calculate claim rate:
  $Claim Rate_{year} = \frac{\text{Total Claims}}{\text{Total SI}}$
c. Sort by Severity
- Order years from worst to best based on claim rate.

2. Construct Empirical EP Curve from Claims

You now have a finite set of historical claim rates, like:

Year	Claim Rate	Rank	EP = 1 / Rank
2018	2.6%	1	1.00
2020	1.6%	2	0.50
2015	1.2%	3	0.33
2019	0.5%	4	0.25
2016	0.4%	5	0.20
2017	0.0%	6	0.17

This gives you an empirical EP curve:
$EP_{historical}(x) = \text{Probability claim rate exceeds } x$

3. Normalize CAT Model Output

From your CAT model, you receive something like:

Return Period (RP)	Gross Loss (£)	EP = 1 / RP
2	£5M	0.50
10	£15M	0.10
50	£40M	0.02
100	£65M	0.01

Normalize these losses — divide by Total Sum Insured — to convert into claim rates, so that both datasets are on the same scale.

4. Fit Smooth Curves to Each Dataset

Use statistical fitting (e.g. in Python or R):
- Fit a lognormal, Pareto, or stretched exponential to your historical claim data.
- CAT model outputs are often provided already fitted, or use piecewise Pareto fitting.
Now you have:
- $EP_{hist}(x)$
- $EP_{cat}(x)$
Both functions define the probability of exceeding loss x.

5. Blend the Two EP Curves

Construct a blending function $z(x) \in [0, 1]$ to weight the two curves: $\ln \text{EP}{\text{blended}}(x) = (1 - z(x)) \cdot \ln \text{EP}{\text{hist}}(x) + z(x) \cdot \ln \text{EP}_{\text{cat}}(x)$
- For small x (frequent losses): $z(x) \approx 0$ ⇒ rely on historical data
- For large x (tail risk): $z(x) \approx 1$ ⇒ rely on CAT model
Blending options (from LSapp):
- Automatic: Blending region selected where both curves overlap.
- Manual: User specifies EP or loss range where blending should occur.

6. Calculate Blended Expected Ultimate Loss (EUL)

Once the blended curve is created, the Expected Ultimate Loss is:
$EUL = \int{0}^{\infty} \text{EP}{\text{blended}}(x) dx$
This represents the average annual loss, accounting for both frequent and rare events. It is your best estimate for the peril’s total annual cost.

7. Derive Calibration Factor

You’ve already built a GLM for wind/flood/fire, which outputs an RP_GLM (Risk Premium per policy) on an ultimate loss basis.
Now, calibrate it:
$Calibration Factor = \frac{\text{Blended EUL}}{\text{Average RP}_{\text{GLM}}}$
This scales the GLM to match the long-term loss level.

8. Final EUL per Policy

$EUL{policy} = \text{RP}{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{NatCat Infl}$
- Where:
 - Inflation: Economic inflation for projected repair costs
 - ULAE: Unallocated Loss Adjustment Expenses
 - NatCat Infl: Adjustment for peril-specific trends (e.g., climate change)
This EUL is the final technical premium per policy for that peril.

Summary Table

Step	Description	Output
1	Clean and aggregate historical claims	Annual claim rates
2	Construct empirical EP curve	Discrete EP points
3	Normalize CAT model outputs	Comparable EP curve
4	Fit both curves	Smooth EP functions
5	Blend using log-weighting	Blended EP curve
6	Integrate curve	Blended EUL
7	Calibrate model	Calibration factor
8	Multiply through	Final technical price per policy

Key Takeaways

Own Experience is rich in frequency data; CAT models are essential for tail risk.
Blending is not just arithmetic — it’s a smooth transition across loss scales.
Fitting & blending require statistical care to ensure monotonicity and realism.
EUL is the foundation of a fair, forward-looking NatCat premium.
GLM is built on only Attritional claims (usually).

Additional Context on EP Calculation: Why Not Use EP = 1/rank?

Because that method is a simplified approximation useful when you have a small dataset and want a quick estimate of the empirical EP curve.
A more refined method for assigning EPs based on midpoints between empirical percentiles is better aligned with how insurers blend empirical and CAT model distributions for level setting.

Claim Rate per Million

This is calculated as:
$Claim Rate per Million = \left( \frac{\text{Loss}}{\text{SI}} \right) \times 10^6$

Ranking the Years

Years are sorted from highest to lowest claim rate. This gives you the order of "worst to best years."

Why Use Upper/Lower Bound EPs Instead of 1/Rank?

Because 1/Rank is discrete and step-like, whereas insurance risk models typically need a smooth cumulative loss distribution to blend with CAT model curves.
In this case, they divide the full historical period (e.g., 9 years = 9 points) into even 9 intervals across the [0%, 100%] EP scale:
- Each step = 11% (approx., since 100% / 9 ≈ 11%)
- For each point, they assign:
  - Lower Bound EP = (i − 1) × 11%
  - Upper Bound EP = i × 11%
  - Midpoint EP = average of those two

Why This Is Better

Avoids assigning zero EP to the worst year (which would suggest infinite return period — incorrect).
Gives a bounded probability range per event — important when plotting the EP curve.
Matches conventions used for actuarial blending with CAT model curves.

BONUS: What is the EP in this context?

It’s the exceedance probability — the likelihood in any given year that the portfolio-wide claim rate exceeds the rate observed in that year.
EP(x) = \text{Pr}(\text{Claim Rate} > x) By converting empirical losses into intervals (e.g. “this year represents the top 11%”), we can compare and blend with simulated EP curves from CAT models, which are also expressed as:
- Return Period (RP)
- or Equivalently: $EP = \frac{1}{\text{RP}}$

Summary

Method	EP Formula	Notes
1/Rank	EP = 1 / rank	Simple, fast, but crude and potentially misleading
Midpoint Bounds	EP = (Upper + Lower) / 2	Smooth, avoids zero EP, aligns with blending best practice

Final Conceptual Flow: From Raw Data to Blended EUL

Step 1: Calculate EP Curves for Both Data Sources

A. Historical Claims → Empirical EP Curve
- You calculate claim rates (loss/SI) for each year.
- Sort from worst to best.
- Assign EP intervals (e.g. midpoint between 0% and 11%, then 11–22%, etc.)
B. CAT Model → Model EP Curve
- This is usually already provided as:

Convert normalized loss by Total SI to claim rate, so both curves are on the same scale.

Step 2: Fit Smooth Functions to Each Curve

Why fit a curve?
- Raw points are discrete.
- Fitting smooth functions allows interpolation between points.
- Blending is performed smoothly using logarithmic weighting.
Typical Distributions Used

Data Source	Curve Type
Historical (frequent events)	Lognormal, Weibull, Kernel smoothing
CAT model (tail-heavy)	Piecewise Pareto, Generalized Pareto (GPD)	vendor-defined

You now have two functions: $EP{hist}(x)$ , $EP{cat}(x)$
Each defines the probability of exceeding loss xxx — the full exceedance probability space.

Step 3: Blend the Curves

We use log-weighted blending: $\ln EP{\text{blend}}(x) = (1 - z(x)) \cdot \ln EP{\text{hist}}(x) + z(x) \cdot \ln EP_{\text{cat}}(x)$
- Where:
 - $z(x) \in [0, 1]$ is a blending function
 - $z(x) = 0$ for small losses → rely on historical data
 - $z(x) = 1$ for large losses → rely on CAT model
 - Smooth transition in between (e.g., logistic, linear, or custom threshold)
Why log-blending?

Step 4: Calculate the EUL

The Expected Ultimate Loss is the area under the EP curve:
$EUL = \int0^{\infty} EP{\text{blend}}(x)\,dx$
In practice:
- Use numerical integration (e.g., trapezoidal rule)
- The units are % of SI, or £ if you multiply by SI

Step 5: Use EUL to Calibrate Pricing

Final risk premium per policy is:
$Final Premium = RP_{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{Other Loadings}$

Advanced Bonus: What Blending Function Should I Use?

You can use a sigmoid/logistic function like: $z(x) = \frac{1}{1 + e^{-k(x - x_0)}}$
- $x_0$ is the transition point (e.g. 0.5% loss rate)
- $k$ controls smoothness (higher = sharper transition)

Step-by-Step Example: Earthquake CAT Model

The goal is to modeling earthquake risk in Los Angeles for an insurance portfolio of residential properties.

Hazard Module:
- Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data. Event frequency: how often earthquakes of certain magnitudes occur. There are stochastic earthquake events which each have properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g. PGA) at many locations.
- Output: event_id, magnitude, epicenter, return_period and intensities (ground acceleration).
Exposure Module:
- Purpose: Identify what’s at risk. Inputting each property with location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial) and Replacement value (sum insured).
- Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.
Vulnerability Module:
- Purpose: Estimate physical damage based on hazard intensity.
  - Inputs: Ground motion intensity (from hazard module and exposure details (from exposure module).
  - Vulnerability curves based on construction type and hazard.
    It uses PGA = Mean Damage Ratio = % of replacement value expected to be lost.
    Interpolate from curve.
- Output is property_id, event_id, damage_ratio and physical_loss.
Financial Module
- Purpose: Apply insurance terms to convert physical loss → insured loss. Physical loss from vulnerability module. Calculate applying deductible then co-insurance. Repeat this process for all properties affected by the event, and sum to get total insured loss for the event.
  - Output: event_id, property_id and insured_loss.

Repeat process for all simulated events (earthquakes)

Loss Distributions

Average Annual Loss (AAL): The expected loss per year over a long period (e.g., $25M/year).
Probable Maximum Loss (PML): The largest loss expected at a given return period (e.g., 1-in-250-year loss = $500M).
Loss Exceedance Curve (LEC) plots probability of exceeding a certain level of loss with annual probability of exceedance.

Step-by-step Confirmation of What Each Module Does

Hazard Module:
- Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data.
- Estimate event frequency: how often earthquakes of certain magnitudes occur.
- Generate stochastic earthquake events, each with properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g., PGA) at many locations.
- Output: eventid, magnitude, epicenter, returnperiod, and intensities (ground acceleration).
Exposure Module:
- Identify what’s at risk.
- Input each property's location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial), and Replacement value (sum insured).
- Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.
Vulnerability Module:
- Estimate physical damage based on hazard intensity.
- Inputs: Ground motion intensity (from the hazard module) and exposure details (from the exposure module).
- Apply vulnerability curves based on construction type and hazard.
  - PGA = Mean Damage Ratio = % of replacement value expected to be lost.
  - Interpolate from the curve.
- Output: propertyid, eventid, damageratio, and physicalloss.
Financial Module:
- Apply insurance terms to convert physical loss → insured loss.
- Use physical loss from the vulnerability module.
- Calculate loss after applying deductible and co-insurance.
- Repeat this process for all properties affected by the event and sum to get the total insured loss for the event.
- Output: eventid, propertyid, and insured_loss.

Should we use CAT claims in the GLM model?

🧠 There Are Two Common Approaches — And Both Are Used, Depending on the Goal

Approach	GLM Trained On	CAT Claims	Calibration Method
A. Full Claims GLM (with CATs flagged)	All claims: CAT + non-CAT	Included, with event flag	Adjust with calibration factor (to match CAT model EAL)
B. Clean GLM (attritional only)	Non-CAT claims only	Excluded via DBSCAN or rule	Calibrate separately using EAL ÷ GLM prediction

🔍 The Key Difference Comes Down to Modeling Philosophy

✅ Approach A: GLM trained on all claims (flagged)

This is more common in practice for retail pricing because:

It reflects real-world loss experience (CATs are rare, but do happen)

Keeps the model simple — just one GLM

Adds a feature/flag for CAT events so:

You can analyze how much CATs influence pricing

You can blend predictions or apply calibration if needed

Calibration in this case:
You compare:

The GLM prediction for CAT risk, possibly separated by event type

The Expected Annual Loss (EAL) from the CAT model

Then compute a calibration factor and apply it to the CAT component of the GLM.

✅ Approach B: GLM trained only on attritional claims

This is cleaner conceptually, but:

You need to reliably separate CAT vs. non-CAT claims (DBSCAN etc.)

You're assuming CATs are handled entirely by the CAT model

Used when you're building separate layers:

Attritional pricing = GLM

CAT pricing = EAL from EP curve

Then: TP = GLM (attritional) + EUL (CAT)