NatCat Pricing in Retail Notes

What is NatCat Pricing in Retail?

  • In retail insurance (e.g., home insurance), NatCat pricing refers to how insurers calculate premiums that account for the risk of natural perils, such as floods, storms, hail, wildfires, and earthquakes.

  • Unlike typical claims that arise from isolated incidents (like burst pipes), NatCat events affect many policies simultaneously, creating significant "accumulation risk".

  • The pricing process balances historical claim data (Own Experience) and simulated catastrophe models (CAT models) to estimate Expected Ultimate Loss (EUL) and derive a fair premium.

Core Components of the NatCat Pricing Framework

  1. Own Experience (OE) Data

  2. Precise Geolocation and GIS Enrichment

  3. Catastrophe Models (CAT Models)

  4. Statistical Modeling (GLMs)

  5. Level Setting (Blending OE with CAT)

  6. Calibration and Final Premium Derivation

Full Walkthrough with Worked Example

  • Let’s say we are SafeHome Insurers, a retail insurer pricing windstorm risk for residential buildings in a UK coastal region.

PHASE 1: Build the Data Foundation
Step 1: Data Collection & Geocoding

Data Type

Examples

Policy Data

Address, Sum Insured, Roof Type, Construction Year

Claims Data

Claim Date, Loss Amount, Location, Peril Type

  • Geocode everything to rooftop level using high-precision geocoders.

  • Why? For wind/flood perils, 10 meters can mean the difference between high damage and none.

Step 2: Analyze Historical Claims (Own Experience)
  • Inflate claims to a common base year (e.g., 2024) to account for inflation in repair costs.

  • Use CatDash (or internal BI tools) to:

    • Visualize claim clusters by year and geography

    • Identify unusual claims (e.g., "wind" that might be "hail")

  • Apply ST-DBSCAN (CatClustering):

    • Clusters historical claims into discrete "events" (e.g., Storm Eunice 2022)

    • Assign event footprints and characteristics

  • Example: SafeHome identifies 3 major windstorm clusters over 10 years.

PHASE 2: Add Geo-Spatial Insight & Forward-Looking Risk
Step 3: GIS Enrichment
  • SafeHome adds external variables using (X,Y) coordinates:

Variable

Type

Description

Distance to coast

Vector

Higher wind risk close to sea

Wind speed return period

Raster

ClimEX/ERA5 estimate

Elevation

Raster

Low elevation = more risk in floods

  • Pre-select variables based on:

    • Correlation (remove redundant ones)

    • Consistency across regions (avoid proxies for geography)

Step 4: CAT Model Simulation (Forward-Looking View)
  • Feed policy-level data into a vendor CAT model (e.g., RMS, AIR).

  • 4 Model Components:

    1. Hazard: Simulates thousands of windstorm events and their intensities.

    2. Exposure: Matches these events to policy data.

    3. Vulnerability: Calculates physical damage based on building type and wind speed.

    4. Financial: Applies deductibles, limits, and terms to estimate financial loss.

  • Output: An Annual Exceedance Probability (AEP) curve, showing probabilities of portfolio losses being exceeded in a year.

  • Example (simplified AEP output):

Return Period (years)

Loss (£M)

2

2.5

10

7.0

50

20.0

100

35.0

PHASE 3: Pricing the Risk
Step 5: Level Setting (Blending OE & CAT Model)
  • Problem: Historical OE underrepresents rare tail events (like 1-in-100-year storms).

  • Solution: Blending in LSapp

    • OE captures frequent, small events well

    • CAT captures rare, extreme events

  • Create a blended loss distribution using both.

  • Blending Output: If OE gives £6M average annual loss, and CAT AEP gives £9M, blending might yield £7.5M EUL.

  • Calibration Factor:
    CalibrationFactor=Blended EULGLM-predicted EUL=7.56=1.25Calibration Factor = \frac{\text{Blended EUL}}{\text{GLM-predicted EUL}} = \frac{7.5}{6} = 1.25

Step 6: Generalized Linear Modeling (GLM)
  • GLMs estimate how each feature (e.g., roof type, distance to coast) affects risk.

  • Build models for:

    • Frequency (how often claims happen)

    • Severity (how large claims are)

  • Inputs:

    • Cleaned, enriched data

    • Pre-selected GIS variables

    • Flags for extreme events (optional exclusion)

  • Output: RPGLM=£50 per policy(average output before scaling)RP_{GLM} = £50 \text{ per policy} \quad (\text{average output before scaling})

  • Final EUL Formula (Per Policy):
    EUL=RPGLM×Calibration Factor×Inflation×ULAE×NatCat InflEUL = \text{RP}_{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{NatCat Infl}

  • Say:

    • RP_GLM = £50

    • Calibration Factor = 1.25

    • Inflation = 1.03 (3% projected inflation)

    • ULAE = 1.02 (2% admin load)

    • NatCat Infl = 1.00 (no expected increase)

  • Then:
    EUL=50×1.25×1.03×1.02×1.00=£65.8EUL = 50 \times 1.25 \times 1.03 \times 1.02 \times 1.00 = £65.8

  • This becomes the risk premium (before profit margin, commissions, taxes).

PHASE 4: Deployment & Monitoring
  • Deploy premium in rating engine

  • Monitor performance vs. actuals

  • Revisit GIS enrichments, claims data, and CAT model assumptions annually

Summary Table of NatCat Pricing Pipeline

Phase

Step

Tool/Technique

Output

1

Data Prep

Geocoding, Inflation

Cleaned policy & claims dataset

1

Historical Loss Analysis

CatDash, ST-DBSCAN

Clustered historical events

2

Geo Enrichment

GIS tools, raster/vector data

Spatial rating factors

2

CAT Modeling

RMS/AIR model

AEP Curve

3

Blending

LSapp

Blended EUL, Calibration Factor

3

GLM Modeling

Emblem / R / Python

Base RP per policy

3

Final Pricing

EUL formula

Final technical price

4

Monitoring

BI dashboards, claims review

Feedback loop

Key Takeaways

  • Precision geocoding underpins everything. Garbage in, garbage out.

  • CAT models are essential for rare event tail risks.

  • Historical data alone is not enough — it misses tail risk.

  • Blending (Level Setting) combines both worlds into a realistic premium.

  • GIS enrichment adds predictive power to the pricing model.

  • Calibration aligns technical model outputs with long-term expectations.

Final Notes: Blending Historical Claims with CAT Model Outputs in NatCat Pricing

Context: Why Blending Is Needed
  • In Natural Catastrophe (NatCat) pricing, insurers need to estimate the Expected Ultimate Loss (EUL) — the average annual loss from a peril like windstorm, flood, or hail.

  • To do this accurately, we combine two fundamentally different sources:

Source

Captures

Weakness

Historical Claims (Own Experience)

Frequent, low-severity events (attritional)

Doesn’t capture rare catastrophic events

CAT Model Output

Rare, extreme events (tail risk)

Often unreliable for small, frequent losses

  • The solution is to blend both data sources into a single exceedance probability (EP) curve, then calculate the EUL as the area under that curve.

Step-by-Step Process
1. Prepare Historical Claims Data (Own Experience)
  • a. Clean and Enrich Data

    • Ensure claims and exposures are geocoded.

    • Inflate claims and Sum Insured (SI) to a common base year.

  • b. Aggregate Annually

    • Group claims by accident year.

    • Sum claims and SI per year.

    • Calculate claim rate:
      ClaimRateyear=Total ClaimsTotal SIClaim Rate_{year} = \frac{\text{Total Claims}}{\text{Total SI}}

  • c. Sort by Severity

    • Order years from worst to best based on claim rate.

2. Construct Empirical EP Curve from Claims
  • You now have a finite set of historical claim rates, like:

Year

Claim Rate

Rank

EP = 1 / Rank

2018

2.6%

1

1.00

2020

1.6%

2

0.50

2015

1.2%

3

0.33

2019

0.5%

4

0.25

2016

0.4%

5

0.20

2017

0.0%

6

0.17

  • This gives you an empirical EP curve:
    EPhistorical(x)=Probability claim rate exceeds xEP_{historical}(x) = \text{Probability claim rate exceeds } x

3. Normalize CAT Model Output
  • From your CAT model, you receive something like:

Return Period (RP)

Gross Loss (£)

EP = 1 / RP

2

£5M

0.50

10

£15M

0.10

50

£40M

0.02

100

£65M

0.01

  • Normalize these losses — divide by Total Sum Insured — to convert into claim rates, so that both datasets are on the same scale.

4. Fit Smooth Curves to Each Dataset
  • Use statistical fitting (e.g. in Python or R):

    • Fit a lognormal, Pareto, or stretched exponential to your historical claim data.

    • CAT model outputs are often provided already fitted, or use piecewise Pareto fitting.

  • Now you have:

    • EPhist(x)EP_{hist}(x)

    • EPcat(x)EP_{cat}(x)

  • Both functions define the probability of exceeding loss x.

5. Blend the Two EP Curves
  • Construct a blending function z(x)[0,1]z(x) \in [0, 1] to weight the two curves: lnEP<em>blended(x)=(1z(x))lnEP</em>hist(x)+z(x)lnEPcat(x)\ln \text{EP}<em>{\text{blended}}(x) = (1 - z(x)) \cdot \ln \text{EP}</em>{\text{hist}}(x) + z(x) \cdot \ln \text{EP}_{\text{cat}}(x)

    • For small x (frequent losses): z(x)0z(x) \approx 0 ⇒ rely on historical data

    • For large x (tail risk): z(x)1z(x) \approx 1 ⇒ rely on CAT model

  • Blending options (from LSapp):

    • Automatic: Blending region selected where both curves overlap.

    • Manual: User specifies EP or loss range where blending should occur.

6. Calculate Blended Expected Ultimate Loss (EUL)
  • Once the blended curve is created, the Expected Ultimate Loss is:
    EUL=<em>0EP</em>blended(x)dxEUL = \int<em>{0}^{\infty} \text{EP}</em>{\text{blended}}(x) dx

  • This represents the average annual loss, accounting for both frequent and rare events. It is your best estimate for the peril’s total annual cost.

7. Derive Calibration Factor
  • You’ve already built a GLM for wind/flood/fire, which outputs an RP_GLM (Risk Premium per policy) on an ultimate loss basis.

  • Now, calibrate it:
    CalibrationFactor=Blended EULAverage RPGLMCalibration Factor = \frac{\text{Blended EUL}}{\text{Average RP}_{\text{GLM}}}

  • This scales the GLM to match the long-term loss level.

8. Final EUL per Policy
  • EUL<em>policy=RP</em>GLM×Calibration Factor×Inflation×ULAE×NatCat InflEUL<em>{policy} = \text{RP}</em>{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{NatCat Infl}

    • Where:

      • Inflation: Economic inflation for projected repair costs

      • ULAE: Unallocated Loss Adjustment Expenses

      • NatCat Infl: Adjustment for peril-specific trends (e.g., climate change)

  • This EUL is the final technical premium per policy for that peril.

Summary Table

Step

Description

Output

1

Clean and aggregate historical claims

Annual claim rates

2

Construct empirical EP curve

Discrete EP points

3

Normalize CAT model outputs

Comparable EP curve

4

Fit both curves

Smooth EP functions

5

Blend using log-weighting

Blended EP curve

6

Integrate curve

Blended EUL

7

Calibrate model

Calibration factor

8

Multiply through

Final technical price per policy

Key Takeaways
  • Own Experience is rich in frequency data; CAT models are essential for tail risk.

  • Blending is not just arithmetic — it’s a smooth transition across loss scales.

  • Fitting & blending require statistical care to ensure monotonicity and realism.

  • EUL is the foundation of a fair, forward-looking NatCat premium.

  • GLM is built on only Attritional claims (usually).

Additional Context on EP Calculation: Why Not Use EP = 1/rank?

  • Because that method is a simplified approximation useful when you have a small dataset and want a quick estimate of the empirical EP curve.

  • A more refined method for assigning EPs based on midpoints between empirical percentiles is better aligned with how insurers blend empirical and CAT model distributions for level setting.

Claim Rate per Million
  • This is calculated as:
    ClaimRateperMillion=(LossSI)×106Claim Rate per Million = \left( \frac{\text{Loss}}{\text{SI}} \right) \times 10^6

Ranking the Years
  • Years are sorted from highest to lowest claim rate. This gives you the order of "worst to best years."

Why Use Upper/Lower Bound EPs Instead of 1/Rank?
  • Because 1/Rank is discrete and step-like, whereas insurance risk models typically need a smooth cumulative loss distribution to blend with CAT model curves.

  • In this case, they divide the full historical period (e.g., 9 years = 9 points) into even 9 intervals across the [0%, 100%] EP scale:

    • Each step = 11% (approx., since 100% / 9 ≈ 11%)

    • For each point, they assign:

      • Lower Bound EP = (i − 1) × 11%

      • Upper Bound EP = i × 11%

      • Midpoint EP = average of those two

Why This Is Better
  • Avoids assigning zero EP to the worst year (which would suggest infinite return period — incorrect).

  • Gives a bounded probability range per event — important when plotting the EP curve.

  • Matches conventions used for actuarial blending with CAT model curves.

BONUS: What is the EP in this context?
  • It’s the exceedance probability — the likelihood in any given year that the portfolio-wide claim rate exceeds the rate observed in that year.

  • EP(x) = \text{Pr}(\text{Claim Rate} > x) By converting empirical losses into intervals (e.g. “this year represents the top 11%”), we can compare and blend with simulated EP curves from CAT models, which are also expressed as:

    • Return Period (RP)

    • or Equivalently: EP=1RPEP = \frac{1}{\text{RP}}

Summary

Method

EP Formula

Notes

1/Rank

EP = 1 / rank

Simple, fast, but crude and potentially misleading

Midpoint Bounds

EP = (Upper + Lower) / 2

Smooth, avoids zero EP, aligns with blending best practice

Final Conceptual Flow: From Raw Data to Blended EUL

Step 1: Calculate EP Curves for Both Data Sources
  • A. Historical Claims → Empirical EP Curve

    • You calculate claim rates (loss/SI) for each year.

    • Sort from worst to best.

    • Assign EP intervals (e.g. midpoint between 0% and 11%, then 11–22%, etc.)

  • B. CAT Model → Model EP Curve

    • This is usually already provided as:

Convert normalized loss by Total SI to claim rate, so both curves are on the same scale.

Step 2: Fit Smooth Functions to Each Curve
  • Why fit a curve?

    • Raw points are discrete.

    • Fitting smooth functions allows interpolation between points.

    • Blending is performed smoothly using logarithmic weighting.

  • Typical Distributions Used

Data Source

Curve Type

Historical (frequent events)

Lognormal, Weibull, Kernel smoothing

CAT model (tail-heavy)

Piecewise Pareto, Generalized Pareto (GPD)

vendor-defined

  • You now have two functions: EP<em>hist(x)EP<em>{hist}(x), EP</em>cat(x)EP</em>{cat}(x)

  • Each defines the probability of exceeding loss xxx — the full exceedance probability space.

Step 3: Blend the Curves
  • We use log-weighted blending: lnEP<em>blend(x)=(1z(x))lnEP</em>hist(x)+z(x)lnEPcat(x)\ln EP<em>{\text{blend}}(x) = (1 - z(x)) \cdot \ln EP</em>{\text{hist}}(x) + z(x) \cdot \ln EP_{\text{cat}}(x)

    • Where:

      • z(x)[0,1]z(x) \in [0, 1] is a blending function

      • z(x)=0z(x) = 0 for small losses → rely on historical data

      • z(x)=1z(x) = 1 for large losses → rely on CAT model

      • Smooth transition in between (e.g., logistic, linear, or custom threshold)

  • Why log-blending?

Step 4: Calculate the EUL
  • The Expected Ultimate Loss is the area under the EP curve:
    EUL=<em>0EP</em>blend(x)dxEUL = \int<em>0^{\infty} EP</em>{\text{blend}}(x)\,dx

  • In practice:

    • Use numerical integration (e.g., trapezoidal rule)

    • The units are % of SI, or £ if you multiply by SI

Step 5: Use EUL to Calibrate Pricing
  • Final risk premium per policy is:
    FinalPremium=RPGLM×Calibration Factor×Inflation×ULAE×Other LoadingsFinal Premium = RP_{\text{GLM}} \times \text{Calibration Factor} \times \text{Inflation} \times \text{ULAE} \times \text{Other Loadings}

Advanced Bonus: What Blending Function Should I Use?
  • You can use a sigmoid/logistic function like: z(x)=11+ek(xx0)z(x) = \frac{1}{1 + e^{-k(x - x_0)}}

    • x0x_0 is the transition point (e.g. 0.5% loss rate)

    • kk controls smoothness (higher = sharper transition)

Step-by-Step Example: Earthquake CAT Model

The goal is to modeling earthquake risk in Los Angeles for an insurance portfolio of residential properties.

  1. Hazard Module:

    • Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data. Event frequency: how often earthquakes of certain magnitudes occur. There are stochastic earthquake events which each have properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g. PGA) at many locations.

    • Output: event_id, magnitude, epicenter, return_period and intensities (ground acceleration).

  2. Exposure Module:

    • Purpose: Identify what’s at risk. Inputting each property with location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial) and Replacement value (sum insured).

    • Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.

  3. Vulnerability Module:

    • Purpose: Estimate physical damage based on hazard intensity.

      • Inputs: Ground motion intensity (from hazard module and exposure details (from exposure module).

      • Vulnerability curves based on construction type and hazard.
        It uses PGA = Mean Damage Ratio = % of replacement value expected to be lost.
        Interpolate from curve.

    • Output is property_id, event_id, damage_ratio and physical_loss.

  4. Financial Module

    • Purpose: Apply insurance terms to convert physical loss → insured loss. Physical loss from vulnerability module. Calculate applying deductible then co-insurance. Repeat this process for all properties affected by the event, and sum to get total insured loss for the event.

      • Output: event_id, property_id and insured_loss.

Repeat process for all simulated events (earthquakes)

Loss Distributions
  • Average Annual Loss (AAL): The expected loss per year over a long period (e.g., $25M/year).

  • Probable Maximum Loss (PML): The largest loss expected at a given return period (e.g., 1-in-250-year loss = $500M).

  • Loss Exceedance Curve (LEC) plots probability of exceeding a certain level of loss with annual probability of exceedance.

Step-by-step Confirmation of What Each Module Does

  1. Hazard Module:

    • Simulate earthquake events by taking inputs of seismic hazard data: fault lines, tectonic data, historical earthquake data.

    • Estimate event frequency: how often earthquakes of certain magnitudes occur.

    • Generate stochastic earthquake events, each with properties: event_id, location, magnitude, depth, frequency, ground motion intensity (e.g., PGA) at many locations.

    • Output: eventid, magnitude, epicenter, returnperiod, and intensities (ground acceleration).

  2. Exposure Module:

    • Identify what’s at risk.

    • Input each property's location (latitude/longitude or postcode), building characteristics (construction type, number of stories, age), occupancy (residential, commercial), and Replacement value (sum insured).

    • Output: Structured exposure data linked to geocodes, ready to be matched to hazard intensities.

  3. Vulnerability Module:

    • Estimate physical damage based on hazard intensity.

    • Inputs: Ground motion intensity (from the hazard module) and exposure details (from the exposure module).

    • Apply vulnerability curves based on construction type and hazard.

      • PGA = Mean Damage Ratio = % of replacement value expected to be lost.

      • Interpolate from the curve.

    • Output: propertyid, eventid, damageratio, and physicalloss.

  4. Financial Module:

    • Apply insurance terms to convert physical loss → insured loss.

    • Use physical loss from the vulnerability module.

    • Calculate loss after applying deductible and co-insurance.

    • Repeat this process for all properties affected by the event and sum to get the total insured loss for the event.

    • Output: eventid, propertyid, and insured_loss.

Should we use CAT claims in the GLM model?

🧠 There Are Two Common Approaches — And Both Are Used, Depending on the Goal

Approach

GLM Trained On

CAT Claims

Calibration Method

A. Full Claims GLM (with CATs flagged)

All claims: CAT + non-CAT

Included, with event flag

Adjust with calibration factor (to match CAT model EAL)

B. Clean GLM (attritional only)

Non-CAT claims only

Excluded via DBSCAN or rule

Calibrate separately using EAL ÷ GLM prediction

🔍 The Key Difference Comes Down to Modeling Philosophy

Approach A: GLM trained on all claims (flagged)

This is more common in practice for retail pricing because:

It reflects real-world loss experience (CATs are rare, but do happen)

Keeps the model simple — just one GLM

Adds a feature/flag for CAT events so:

You can analyze how much CATs influence pricing

You can blend predictions or apply calibration if needed

Calibration in this case:
You compare:

The GLM prediction for CAT risk, possibly separated by event type

The Expected Annual Loss (EAL) from the CAT model

Then compute a calibration factor and apply it to the CAT component of the GLM.

Approach B: GLM trained only on attritional claims

This is cleaner conceptually, but:

You need to reliably separate CAT vs. non-CAT claims (DBSCAN etc.)

You're assuming CATs are handled entirely by the CAT model

Used when you're building separate layers:

Attritional pricing = GLM

CAT pricing = EAL from EP curve

Then: TP = GLM (attritional) + EUL (CAT)