NatCat Pricing in Retail Notes
What is NatCat Pricing in Retail?
In retail insurance (e.g., home insurance), NatCat pricing refers to how insurers calculate premiums that account for the risk of natural perils, such as floods, storms, hail, wildfires, and earthquakes.
Unlike typical claims that arise from isolated incidents (like burst pipes), NatCat events affect many policies simultaneously, creating significant "accumulation risk".
The pricing process balances historical claim data (Own Experience) and simulated catastrophe models (CAT models) to estimate Expected Ultimate Loss (EUL) and derive a fair premium.
Core Components of the NatCat Pricing Framework
Own Experience (OE) Data
Precise Geolocation and GIS Enrichment
Catastrophe Models (CAT Models)
Statistical Modeling (GLMs)
Level Setting (Blending OE with CAT)
Calibration and Final Premium Derivation
Full Walkthrough with Worked Example
Let’s say we are SafeHome Insurers, a retail insurer pricing windstorm risk for residential buildings in a UK coastal region.
PHASE 1: Build the Data Foundation
Step 1: Data Collection & Geocoding
Data Type | Examples |
|---|---|
Policy Data | Address, Sum Insured, Roof Type, Construction Year |
Claims Data | Claim Date, Loss Amount, Location, Peril Type |
Geocode everything to rooftop level using high-precision geocoders.
Why? For wind/flood perils, 10 meters can mean the difference between high damage and none.
Step 2: Analyze Historical Claims (Own Experience)
Inflate claims to a common base year (e.g., 2024) to account for inflation in repair costs.
Use CatDash (or internal BI tools) to:
Visualize claim clusters by year and geography
Identify unusual claims (e.g., "wind" that might be "hail")
Apply ST-DBSCAN (CatClustering):
Clusters historical claims into discrete "events" (e.g., Storm Eunice 2022)
Assign event footprints and characteristics
Example: SafeHome identifies 3 major windstorm clusters over 10 years.
PHASE 2: Add Geo-Spatial Insight & Forward-Looking Risk
Step 3: GIS Enrichment
SafeHome adds external variables using (X,Y) coordinates:
Variable | Type | Description |
|---|---|---|
Distance to coast | Vector | Higher wind risk close to sea |
Wind speed return period | Raster | ClimEX/ERA5 estimate |
Elevation | Raster | Low elevation = more risk in floods |
Pre-select variables based on:
Correlation (remove redundant ones)
Consistency across regions (avoid proxies for geography)
Step 4: CAT Model Simulation (Forward-Looking View)
Feed policy-level data into a vendor CAT model (e.g., RMS, AIR).
4 Model Components:
Hazard: Simulates thousands of windstorm events and their intensities.
Exposure: Matches these events to policy data.
Vulnerability: Calculates physical damage based on building type and wind speed.
Financial: Applies deductibles, limits, and terms to estimate financial loss.
Output: An Annual Exceedance Probability (AEP) curve, showing probabilities of portfolio losses being exceeded in a year.
Example (simplified AEP output):
Return Period (years) | Loss (£M) |
|---|---|
2 | 2.5 |
10 | 7.0 |
50 | 20.0 |
100 | 35.0 |
PHASE 3: Pricing the Risk
Step 5: Level Setting (Blending OE & CAT Model)
Problem: Historical OE underrepresents rare tail events (like 1-in-100-year storms).
Solution: Blending in LSapp
OE captures frequent, small events well
CAT captures rare, extreme events
Create a blended loss distribution using both.
Blending Output: If OE gives £6M average annual loss, and CAT AEP gives £9M, blending might yield £7.5M EUL.
Calibration Factor:
Step 6: Generalized Linear Modeling (GLM)
GLMs estimate how each feature (e.g., roof type, distance to coast) affects risk.
Build models for:
Frequency (how often claims happen)
Severity (how large claims are)
Inputs:
Cleaned, enriched data
Pre-selected GIS variables
Flags for extreme events (optional exclusion)
Output:
Final EUL Formula (Per Policy):
Say:
RP_GLM = £50
Calibration Factor = 1.25
Inflation = 1.03 (3% projected inflation)
ULAE = 1.02 (2% admin load)
NatCat Infl = 1.00 (no expected increase)
Then:
This becomes the risk premium (before profit margin, commissions, taxes).
PHASE 4: Deployment & Monitoring
Deploy premium in rating engine
Monitor performance vs. actuals
Revisit GIS enrichments, claims data, and CAT model assumptions annually
Summary Table of NatCat Pricing Pipeline
Phase | Step | Tool/Technique | Output |
|---|---|---|---|
1 | Data Prep | Geocoding, Inflation | Cleaned policy & claims dataset |
1 | Historical Loss Analysis | CatDash, ST-DBSCAN | Clustered historical events |
2 | Geo Enrichment | GIS tools, raster/vector data | Spatial rating factors |
2 | CAT Modeling | RMS/AIR model | AEP Curve |
3 | Blending | LSapp | Blended EUL, Calibration Factor |
3 | GLM Modeling | Emblem / R / Python | Base RP per policy |
3 | Final Pricing | EUL formula | Final technical price |
4 | Monitoring | BI dashboards, claims review | Feedback loop |
Key Takeaways
Precision geocoding underpins everything. Garbage in, garbage out.
CAT models are essential for rare event tail risks.
Historical data alone is not enough — it misses tail risk.
Blending (Level Setting) combines both worlds into a realistic premium.
GIS enrichment adds predictive power to the pricing model.
Calibration aligns technical model outputs with long-term expectations.
Final Notes: Blending Historical Claims with CAT Model Outputs in NatCat Pricing
Context: Why Blending Is Needed
In Natural Catastrophe (NatCat) pricing, insurers need to estimate the Expected Ultimate Loss (EUL) — the average annual loss from a peril like windstorm, flood, or hail.
To do this accurately, we combine two fundamentally different sources:
Source | Captures | Weakness |
|---|---|---|
Historical Claims (Own Experience) | Frequent, low-severity events (attritional) | Doesn’t capture rare catastrophic events |
CAT Model Output | Rare, extreme events (tail risk) | Often unreliable for small, frequent losses |
The solution is to blend both data sources into a single exceedance probability (EP) curve, then calculate the EUL as the area under that curve.
Step-by-Step Process
1. Prepare Historical Claims Data (Own Experience)
a. Clean and Enrich Data
Ensure claims and exposures are geocoded.
Inflate claims and Sum Insured (SI) to a common base year.
b. Aggregate Annually
Group claims by accident year.
Sum claims and SI per year.
Calculate claim rate:
c. Sort by Severity
Order years from worst to best based on claim rate.
2. Construct Empirical EP Curve from Claims
You now have a finite set of historical claim rates, like:
Year | Claim Rate | Rank | EP = 1 / Rank |
|---|---|---|---|
2018 | 2.6% | 1 | 1.00 |
2020 | 1.6% | 2 | 0.50 |
2015 | 1.2% | 3 | 0.33 |
2019 | 0.5% | 4 | 0.25 |
2016 | 0.4% | 5 | 0.20 |
2017 | 0.0% | 6 | 0.17 |
This gives you an empirical EP curve:
3. Normalize CAT Model Output
From your CAT model, you receive something like:
Return Period (RP) | Gross Loss (£) | EP = 1 / RP |
|---|---|---|
2 | £5M | 0.50 |
10 | £15M | 0.10 |
50 | £40M | 0.02 |
100 | £65M | 0.01 |
Normalize these losses — divide by Total Sum Insured — to convert into claim rates, so that both datasets are on the same scale.
4. Fit Smooth Curves to Each Dataset
Use statistical fitting (e.g. in Python or R):
Fit a lognormal, Pareto, or stretched exponential to your historical claim data.
CAT model outputs are often provided already fitted, or use piecewise Pareto fitting.
Now you have:
Both functions define the probability of exceeding loss x.
5. Blend the Two EP Curves
Construct a blending function to weight the two curves:
For small x (frequent losses): ⇒ rely on historical data
For large x (tail risk): ⇒ rely on CAT model
Blending options (from LSapp):
Automatic: Blending region selected where both curves overlap.
Manual: User specifies EP or loss range where blending should occur.
6. Calculate Blended Expected Ultimate Loss (EUL)
Once the blended curve is created, the Expected Ultimate Loss is:
This represents the average annual loss, accounting for both frequent and rare events. It is your best estimate for the peril’s total annual cost.
7. Derive Calibration Factor
You’ve already built a GLM for wind/flood/fire, which outputs an RP_GLM (Risk Premium per policy) on an ultimate loss basis.
Now, calibrate it:
This scales the GLM to match the long-term loss level.
8. Final EUL per Policy
Where:
Inflation: Economic inflation for projected repair costs
ULAE: Unallocated Loss Adjustment Expenses
NatCat Infl: Adjustment for peril-specific trends (e.g., climate change)
This EUL is the final technical premium per policy for that peril.
Summary Table
Step | Description | Output |
|---|---|---|
1 | Clean and aggregate historical claims | Annual claim rates |
2 | Construct empirical EP curve | Discrete EP points |
3 | Normalize CAT model outputs | Comparable EP curve |
4 | Fit both curves | Smooth EP functions |
5 | Blend using log-weighting | Blended EP curve |
6 | Integrate curve | Blended EUL |
7 | Calibrate model | Calibration factor |
8 | Multiply through | Final technical price per policy |
Key Takeaways
Own Experience is rich in frequency data; CAT models are essential for tail risk.
Blending is not just arithmetic — it’s a smooth transition across loss scales.
Fitting & blending require statistical care to ensure monotonicity and realism.
EUL is the foundation of a fair, forward-looking NatCat premium.
GLM is built on only Attritional claims (usually).
Additional Context on EP Calculation: Why Not Use EP = 1/rank?
Because that method is a simplified approximation useful when you have a small dataset and want a quick estimate of the empirical EP curve.
A more refined method for assigning EPs based on midpoints between empirical percentiles is better aligned with how insurers blend empirical and CAT model distributions for level setting.
Claim Rate per Million
This is calculated as:
Ranking the Years
Years are sorted from highest to lowest claim rate. This gives you the order of "worst to best years."
Why Use Upper/Lower Bound EPs Instead of 1/Rank?
Because 1/Rank is discrete and step-like, whereas insurance risk models typically need a smooth cumulative loss distribution to blend with CAT model curves.
In this case, they divide the full historical period (e.g., 9 years = 9 points) into even 9 intervals across the [0%, 100%] EP scale:
Each step = 11% (approx., since 100% / 9 ≈ 11%)
For each point, they assign:
Lower Bound EP = (i − 1) × 11%
Upper Bound EP = i × 11%
Midpoint EP = average of those two
Why This Is Better
Avoids assigning zero EP to the worst year (which would suggest infinite return period — incorrect).
Gives a bounded probability range per event — important when plotting the EP curve.
Matches conventions used for actuarial blending with CAT model curves.
BONUS: What is the EP in this context?
It’s the exceedance probability — the likelihood in any given year that the portfolio-wide claim rate exceeds the rate observed in that year.
EP(x) = \text{Pr}(\text{Claim Rate} > x) By converting empirical losses into intervals (e.g. “this year represents the top 11%”), we can compare and blend with simulated EP curves from CAT models, which are also expressed as:
Return Period (RP)
or Equivalently:
Summary
Method | EP Formula | Notes |
|---|---|---|
1/Rank | EP = 1 / rank | Simple, fast, but crude and potentially misleading |
Midpoint Bounds | EP = (Upper + Lower) / 2 | Smooth, avoids zero EP, aligns with blending best practice |
Final Conceptual Flow: From Raw Data to Blended EUL
Step 1: Calculate EP Curves for Both Data Sources
A. Historical Claims → Empirical EP Curve
You calculate claim rates (loss/SI) for each year.
Sort from worst to best.
Assign EP intervals (e.g. midpoint between 0% and 11%, then 11–22%, etc.)
B. CAT Model → Model EP Curve
This is usually already provided as:
Convert normalized loss by Total SI to claim rate, so both curves are on the same scale.
Step 2: Fit Smooth Functions to Each Curve
Why fit a curve?
Raw points are discrete.
Fitting smooth functions allows interpolation between points.
Blending is performed smoothly using logarithmic weighting.
Typical Distributions Used
Data Source | Curve Type | |
|---|---|---|
Historical (frequent events) | Lognormal, Weibull, Kernel smoothing | |
CAT model (tail-heavy) | Piecewise Pareto, Generalized Pareto (GPD) | vendor-defined |
You now have two functions: ,
Each defines the probability of exceeding loss xxx — the full exceedance probability space.
Step 3: Blend the Curves
We use log-weighted blending:
Where:
is a blending function
for small losses → rely on historical data
for large losses → rely on CAT model
Smooth transition in between (e.g., logistic, linear, or custom threshold)
Why log-blending?
Step 4: Calculate the EUL
The Expected Ultimate Loss is the area under the EP curve:
In practice:
Use numerical integration (e.g., trapezoidal rule)
The units are % of SI, or £ if you multiply by SI
Step 5: Use EUL to Calibrate Pricing
Final risk premium per policy is:
Advanced Bonus: What Blending Function Should I Use?
You can use a sigmoid/logistic function like:
is the transition point (e.g. 0.5% loss rate)
controls smoothness (higher = sharper transition)
Step-by-Step Example: Earthquake CAT Model
The goal is to modeling earthquake risk in Los Angeles for an insurance portfolio of residential properties.
Hazard Module:
Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data. Event frequency: how often earthquakes of certain magnitudes occur. There are stochastic earthquake events which each have properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g. PGA) at many locations.
Output: event_id, magnitude, epicenter, return_period and intensities (ground acceleration).
Exposure Module:
Purpose: Identify what’s at risk. Inputting each property with location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial) and Replacement value (sum insured).
Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.
Vulnerability Module:
Purpose: Estimate physical damage based on hazard intensity.
Inputs: Ground motion intensity (from hazard module and exposure details (from exposure module).
Vulnerability curves based on construction type and hazard.
It uses PGA = Mean Damage Ratio = % of replacement value expected to be lost.
Interpolate from curve.
Output is property_id, event_id, damage_ratio and physical_loss.
Financial Module
Purpose: Apply insurance terms to convert physical loss → insured loss. Physical loss from vulnerability module. Calculate applying deductible then co-insurance. Repeat this process for all properties affected by the event, and sum to get total insured loss for the event.
Output: event_id, property_id and insured_loss.
Repeat process for all simulated events (earthquakes)
Loss Distributions
Average Annual Loss (AAL): The expected loss per year over a long period (e.g., $25M/year).
Probable Maximum Loss (PML): The largest loss expected at a given return period (e.g., 1-in-250-year loss = $500M).
Loss Exceedance Curve (LEC) plots probability of exceeding a certain level of loss with annual probability of exceedance.
Step-by-step Confirmation of What Each Module Does
Hazard Module:
Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data.
Estimate event frequency: how often earthquakes of certain magnitudes occur.
Generate stochastic earthquake events, each with properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g., PGA) at many locations.
Output: eventid, magnitude, epicenter, returnperiod, and intensities (ground acceleration).
Exposure Module:
Identify what’s at risk.
Input each property's location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial), and Replacement value (sum insured).
Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.
Vulnerability Module:
Estimate physical damage based on hazard intensity.
Inputs: Ground motion intensity (from the hazard module) and exposure details (from the exposure module).
Apply vulnerability curves based on construction type and hazard.
PGA = Mean Damage Ratio = % of replacement value expected to be lost.
Interpolate from the curve.
Output: propertyid, eventid, damageratio, and physicalloss.
Financial Module:
Apply insurance terms to convert physical loss → insured loss.
Use physical loss from the vulnerability module.
Calculate loss after applying deductible and co-insurance.
Repeat this process for all properties affected by the event and sum to get the total insured loss for the event.
Output: eventid, propertyid, and insured_loss.
Should we use CAT claims in the GLM model?
🧠 There Are Two Common Approaches — And Both Are Used, Depending on the Goal
Approach | GLM Trained On | CAT Claims | Calibration Method |
|---|---|---|---|
A. Full Claims GLM (with CATs flagged) | All claims: CAT + non-CAT | Included, with event flag | Adjust with calibration factor (to match CAT model EAL) |
B. Clean GLM (attritional only) | Non-CAT claims only | Excluded via DBSCAN or rule | Calibrate separately using EAL ÷ GLM prediction |
🔍 The Key Difference Comes Down to Modeling Philosophy
✅ Approach A: GLM trained on all claims (flagged)
This is more common in practice for retail pricing because:
It reflects real-world loss experience (CATs are rare, but do happen)
Keeps the model simple — just one GLM
Adds a feature/flag for CAT events so:
You can analyze how much CATs influence pricing
You can blend predictions or apply calibration if needed
Calibration in this case:
You compare:
The GLM prediction for CAT risk, possibly separated by event type
The Expected Annual Loss (EAL) from the CAT model
Then compute a calibration factor and apply it to the CAT component of the GLM.
✅ Approach B: GLM trained only on attritional claims
This is cleaner conceptually, but:
You need to reliably separate CAT vs. non-CAT claims (DBSCAN etc.)
You're assuming CATs are handled entirely by the CAT model
Used when you're building separate layers:
Attritional pricing = GLM
CAT pricing = EAL from EP curve
Then: TP = GLM (attritional) + EUL (CAT)