Customer Lifetime Value — BG/NBD + Gamma-Gamma

The clustering chapters answered which products are healthy and which customer segments exist. This chapter answers a sharper question: how much is each individual customer worth, and how confident are we?

We use two probabilistic models that are the standard for non-contractual retail (i.e. customers can leave at any time without notice):

Combining the two gives expected revenue per customer over a chosen horizon — the Customer Lifetime Value.

Reference: Fader, Hardie & Lee (2005) for BG/NBD; Fader & Hardie (2013) for Gamma-Gamma. Implementation via the lifetimes Python package.

Code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from lifetimes import BetaGeoFitter, GammaGammaFitter
from lifetimes.utils import (
    summary_data_from_transaction_data,
    calibration_and_holdout_data,
)

sns.set_theme(style="whitegrid")

From line items to (frequency, recency, T)

BG/NBD doesn’t care about baskets — it works with three numbers per customer:

  • frequency — number of repeat transactions (excluding the first; one-time buyers have frequency 0)
  • recency — days between first and last transaction (0 for one-time buyers)
  • T — days between first transaction and the observation end

Per-transaction monetary value comes from the basket gross price.

Code
from pathlib import Path
_data_path = "data/raw/transactions.csv" if Path("data/raw/transactions.csv").exists() else "data/synthetic/transactions.csv"
df = pd.read_csv(_data_path, sep=";", parse_dates=["date"])

# Aggregate line items to one row per (customer, transaction)
tx = (
    df.groupby(["customer_id", "transaction_id", "date"], as_index=False)
      ["gross_price"].sum()
)
print(f"{len(tx)} transactions across {tx['customer_id'].nunique()} customers")

OBS_END = tx["date"].max()
summary = summary_data_from_transaction_data(
    tx,
    customer_id_col="customer_id",
    datetime_col="date",
    monetary_value_col="gross_price",
    observation_period_end=OBS_END,
    freq="D",
)
summary.head()
3515 transactions across 2400 customers
frequency recency T monetary_value
customer_id
C00001 0.0 0.0 665.0 0.0
C00002 0.0 0.0 437.0 0.0
C00003 0.0 0.0 130.0 0.0
C00004 0.0 0.0 463.0 0.0
C00005 0.0 0.0 24.0 0.0
Code
n_total = len(summary)
n_returning = (summary["frequency"] >= 1).sum()
print(f"Customers in summary:      {n_total}")
print(f"  one-time buyers:         {n_total - n_returning}  ({(n_total - n_returning)/n_total:.1%})")
print(f"  returning (freq >= 1):   {n_returning}  ({n_returning/n_total:.1%})")
print()
print("Distribution of frequencies (repeat purchases per customer):")
print(summary["frequency"].value_counts().sort_index().head(10))
Customers in summary:      2400
  one-time buyers:         1708  (71.2%)
  returning (freq >= 1):   692  (28.8%)

Distribution of frequencies (repeat purchases per customer):
frequency
0.0    1708
1.0     443
2.0     151
3.0      55
4.0      24
5.0      11
6.0       6
7.0       2
Name: count, dtype: int64

The 70/30 split between one-time and returning customers is the kind of heavy tail BG/NBD is designed for.

Fit BG/NBD

Code
bgf = BetaGeoFitter(penalizer_coef=0.001)
bgf.fit(summary["frequency"], summary["recency"], summary["T"])
bgf.summary.round(3)
coef se(coef) lower 95% bound upper 95% bound
r 0.199 0.013 0.173 0.225
alpha 62.459 7.149 48.448 76.471
a 1.331 0.188 0.962 1.700
b 1.280 0.199 0.889 1.671

The four parameters describe the population:

  • r, α — shape and scale of the Gamma distribution of transaction rates across customers (heterogeneity in purchasing behavior)
  • a, b — Beta distribution governing the per-transaction dropout probability (heterogeneity in lifetime behavior)

The values say roughly: most customers have low transaction rates, but some are much more active than others; once a customer is inactive for a stretch comparable to their typical inter-purchase time, dropout becomes likely.

Holdout validation

A model that fits the past well isn’t useful unless it predicts the future. We split the timeline: fit on the first 18 months, evaluate predictions over the last 6 months.

Code
CAL_END = OBS_END - pd.Timedelta(days=180)

cal_holdout = calibration_and_holdout_data(
    tx,
    customer_id_col="customer_id",
    datetime_col="date",
    monetary_value_col="gross_price",
    calibration_period_end=CAL_END,
    observation_period_end=OBS_END,
    freq="D",
)

bgf_cal = BetaGeoFitter(penalizer_coef=0.001)
bgf_cal.fit(
    cal_holdout["frequency_cal"],
    cal_holdout["recency_cal"],
    cal_holdout["T_cal"],
)

cal_holdout["predicted_holdout"] = (
    bgf_cal.conditional_expected_number_of_purchases_up_to_time(
        180,
        cal_holdout["frequency_cal"],
        cal_holdout["recency_cal"],
        cal_holdout["T_cal"],
    )
)
print(f"Customers in calibration sample: {len(cal_holdout)}")
print(f"Holdout window:                  {180} days")
print(
    f"Total purchases in holdout:      {int(cal_holdout['frequency_holdout'].sum())} actual, "
    f"{cal_holdout['predicted_holdout'].sum():.0f} predicted"
)
Customers in calibration sample: 1838
Holdout window:                  180 days
Total purchases in holdout:      255 actual, 305 predicted
Code
agg = (
    cal_holdout.groupby("frequency_cal")
    .agg(
        n=("frequency_holdout", "size"),
        actual=("frequency_holdout", "mean"),
        predicted=("predicted_holdout", "mean"),
    )
    .reset_index()
)

fig, ax = plt.subplots(figsize=(8, 4.5))
agg_long = agg.melt(
    id_vars=["frequency_cal", "n"],
    value_vars=["actual", "predicted"],
    var_name="type",
    value_name="purchases",
)
sns.barplot(data=agg_long, x="frequency_cal", y="purchases", hue="type", ax=ax,
            palette={"actual": "steelblue", "predicted": "darkorange"})
ax.set_xlabel("Calibration-period frequency (purchases in first 18 months)")
ax.set_ylabel("Avg purchases in 180-day holdout")
ax.legend(title="")
plt.tight_layout()
plt.show()
Figure 1: Holdout calibration. Each point is a calibration-frequency bin (number of purchases the customer made in the first 18 months). The model’s predicted holdout purchases (orange) tracks the actual (blue) closely — the BG/NBD assumptions fit this dataset well.

The bars line up tightly across calibration buckets — model predictions match holdout reality. With a real-world dataset you’d expect more divergence, but for our synthetic data (which was generated under BG/NBD-like assumptions) the fit is naturally clean.

Who’s still alive?

The most actionable BG/NBD output is the per-customer probability of being alive — which a marketer can use to triage activation campaigns.

Code
summary["p_alive"]    = bgf.conditional_probability_alive(
    summary["frequency"], summary["recency"], summary["T"]
)
summary["exp_90d"]    = bgf.conditional_expected_number_of_purchases_up_to_time(
    90, summary["frequency"], summary["recency"], summary["T"]
)
summary["exp_365d"]   = bgf.conditional_expected_number_of_purchases_up_to_time(
    365, summary["frequency"], summary["recency"], summary["T"]
)
summary[["frequency", "recency", "T", "p_alive", "exp_90d", "exp_365d"]].describe().round(3)
frequency recency T p_alive exp_90d exp_365d
count 2400.000 2400.000 2400.000 2400.000 2400.000 2400.000
mean 0.463 44.806 367.818 0.793 0.076 0.237
std 0.913 102.811 209.565 0.338 0.086 0.249
min 0.000 0.000 1.000 0.001 0.001 0.002
25% 0.000 0.000 189.000 0.480 0.029 0.104
50% 0.000 0.000 368.000 1.000 0.046 0.157
75% 1.000 27.000 548.000 1.000 0.089 0.278
max 7.000 670.000 730.000 1.000 1.055 3.105
Code
fig, ax = plt.subplots(figsize=(8, 4))
returning = summary[(summary["frequency"] >= 1) & (summary["monetary_value"] > 0)]
one_time = summary[summary["frequency"] == 0]
ax.hist(returning["p_alive"], bins=30, alpha=0.7, color="steelblue", label=f"Returning customers (n={len(returning)})")
ax.hist(one_time["p_alive"], bins=30, alpha=0.7, color="lightgrey", label=f"One-time buyers (n={len(one_time)})")
ax.set_xlabel("P(alive) at observation end")
ax.set_ylabel("Count")
ax.legend()
plt.tight_layout()
plt.show()
Figure 2: Distribution of P(alive). One-time buyers concentrate near the population baseline; repeat customers split between still-engaged (right) and lapsed (left tail).

The split is exactly what the model is designed to surface: customers who bought once recently (high T unobserved, no repeats yet) get a baseline P(alive) tied to the population; customers with multiple recent purchases get high P(alive); customers who used to buy but went quiet get low P(alive).

Add monetary — Gamma-Gamma

Gamma-Gamma estimates expected per-transaction value. It can only be fit on customers with at least one repeat purchase (otherwise we have no within-customer monetary spread).

Code
returning = summary[(summary["frequency"] >= 1) & (summary["monetary_value"] > 0)].copy()

ggf = GammaGammaFitter(penalizer_coef=0.01)
ggf.fit(returning["frequency"], returning["monetary_value"])
ggf.summary.round(3)
coef se(coef) lower 95% bound upper 95% bound
p 3.634 0.193 3.256 4.012
q 0.289 0.012 0.265 0.313
v 3.455 0.196 3.071 3.840
Code
returning["exp_avg_value"] = ggf.conditional_expected_average_profit(
    returning["frequency"], returning["monetary_value"]
)
returning[["frequency", "monetary_value", "exp_avg_value"]].describe().round(2)
frequency monetary_value exp_avg_value
count 692.00 692.00 692.00
mean 1.61 692.97 826.36
std 1.03 628.02 766.42
min 1.00 18.57 27.38
25% 1.00 215.85 259.49
50% 1.00 530.36 599.79
75% 2.00 978.37 1124.23
max 7.00 4143.20 5155.67

exp_avg_value shrinks toward the population mean for customers with few transactions (we have less evidence about their personal “true” value) and approaches the customer’s own historical mean as their frequency grows. This Bayesian shrinkage is one of the model’s strengths.

Putting it together — 12-month CLV

Code
HORIZON_MONTHS = 12
DISCOUNT_RATE  = 0.01  # monthly

returning["clv_12m"] = ggf.customer_lifetime_value(
    bgf,
    returning["frequency"],
    returning["recency"],
    returning["T"],
    returning["monetary_value"],
    time=HORIZON_MONTHS,
    discount_rate=DISCOUNT_RATE,
    freq="D",
)

returning[["frequency", "recency", "p_alive", "exp_avg_value", "clv_12m"]].describe().round(2)
frequency recency p_alive exp_avg_value clv_12m
count 692.00 692.00 692.00 692.00 692.00
mean 1.61 155.40 0.28 826.36 281.43
std 1.03 139.59 0.17 766.42 408.08
min 1.00 1.00 0.00 27.38 0.71
25% 1.00 51.00 0.14 259.49 41.69
50% 1.00 113.50 0.25 599.79 118.23
75% 2.00 217.25 0.40 1124.23 338.68
max 7.00 670.00 0.80 5155.67 3807.90
Code
fig, ax = plt.subplots(figsize=(8, 4))
sns.histplot(returning["clv_12m"], bins=40, ax=ax, color="seagreen")
ax.axvline(returning["clv_12m"].mean(),   color="red",  ls="--", label=f"mean €{returning['clv_12m'].mean():.0f}")
ax.axvline(returning["clv_12m"].median(), color="blue", ls="--", label=f"median €{returning['clv_12m'].median():.0f}")
ax.set_xlabel("Predicted 12-month CLV (EUR)")
ax.legend()
plt.tight_layout()
plt.show()
Figure 3: 12-month CLV distribution among returning customers. Long-tailed — most customers contribute modestly, a small head drives a disproportionate share of forecast revenue.

Pareto check — top 20% vs. rest

Code
returning_sorted = returning.sort_values("clv_12m", ascending=False)
total_clv = returning_sorted["clv_12m"].sum()
top_20_clv = returning_sorted["clv_12m"].iloc[: int(0.2 * len(returning_sorted))].sum()
print(f"Top 20% of returning customers account for {top_20_clv / total_clv:.1%} of forecast 12-month revenue.")
Top 20% of returning customers account for 66.1% of forecast 12-month revenue.

Concentrating retention spend on the top decile of forecast CLV is a much stronger lever than treating everyone equally.

Code
returning_sorted.head(10)[
    ["frequency", "recency", "T", "monetary_value", "p_alive", "exp_avg_value", "clv_12m"]
].round(2)
frequency recency T monetary_value p_alive exp_avg_value clv_12m
customer_id
C01334 3.0 135.0 166.0 2086.76 0.61 2233.63 3807.90
C01756 2.0 24.0 46.0 1588.41 0.51 1762.62 2753.80
C02135 4.0 302.0 345.0 1332.77 0.67 1402.24 2295.12
C00738 5.0 376.0 388.0 961.25 0.78 1001.13 2165.55
C01062 1.0 67.0 80.0 2254.45 0.46 2807.33 2011.07
C00805 5.0 242.0 243.0 650.91 0.80 678.14 1974.11
C01889 2.0 42.0 94.0 1687.60 0.41 1872.57 1945.19
C00543 4.0 628.0 637.0 1491.34 0.75 1568.97 1913.03
C00265 1.0 18.0 20.0 1435.60 0.48 1789.22 1810.31
C00465 3.0 219.0 292.0 1461.14 0.54 1564.35 1772.52

What this chapter buys you that RFM doesn’t

The earlier RFM chapter clusters customers (or articles) by recency, frequency, monetary value into discrete tiers. It’s interpretable but punchy: a customer in cluster 2 versus 3 is different, period.

BG/NBD + Gamma-Gamma gives continuous, probabilistic, time-shifted predictions:

  • Continuous: every customer gets a real-valued expected purchase count and CLV, not a categorical tier
  • Probabilistic: outputs come with implied uncertainty (the Gamma/Beta priors); cohorts of similar customers will differ in P(alive) and CLV reflecting available evidence
  • Time-shifted: it forecasts forward over a chosen horizon, while RFM is purely a snapshot of the past

For decisions like “who do we send this win-back email to?” or “which customers should our retention team call?”, the predicted P(alive) and CLV are directly actionable in a way RFM cluster IDs are not.

Limitations

  • One-time buyers: BG/NBD can score them (population-baseline P(alive) and expected purchases) but Gamma-Gamma cannot — they have no within-customer monetary spread. We treated them separately above.
  • Synthetic-data caveat: this dataset was generated under BG/NBD-like assumptions, so the model fits cleanly. Real retail data has structure the model doesn’t capture (seasonality, marketing pushes, life events). The output is still useful but holdout validation matters more.
  • Furniture is unusual: customers buy infrequently. Models tuned on FMCG (groceries, toiletries) often need parameter tweaks for durable goods.