Association Rules — Market Basket Analysis

Given a pile of receipts, association rule mining discovers patterns of the form “customers who buy A and B also tend to buy C”. The textbook algorithm is Apriori — fast, deterministic, and the standard starting point.

A rule \(A \Rightarrow B\) is characterized by three numbers:

Support — fraction of baskets containing \(A \cup B\), i.e. \(P(A \cup B)\). High support = the rule covers a meaningful share of customers.
Confidence — given \(A\) in the basket, how likely is \(B\), i.e. \(P(B \mid A)\). High confidence = the rule is reliable.
Lift — how much more often \(B\) shows up when \(A\) is present compared to its baseline rate, i.e. \(\frac{P(B \mid A)}{P(B)}\). Lift > 1 = positive association.

We mine rules at minimum support 0.001 and minimum confidence 0.5, then sort by confidence. Two implementations run side by side: arules in R and mlxtend in Python.

Data

For market basket analysis we only need two columns: the basket identifier (transaction_id) and the item identifier. We use article_name, not article_id — so a “sofa” rule isn’t fragmented across three SKUs of the same product.

Code

library(dplyr)
library(readr)

.data_path <- if (file.exists("data/raw/transactions.csv")) "data/raw/transactions.csv" else "data/synthetic/transactions.csv"
raw <- read_delim(.data_path, delim = ";", show_col_types = FALSE)
cat("rows:", nrow(raw), "  baskets:", n_distinct(raw$transaction_id), "  items:", n_distinct(raw$article_name), "\n")

rows: 6392   baskets: 3515   items: 40

R — `arules`

Build the transaction object

Code

library(arules)
# C locale needed for stable sort of items across baskets; without it, real
# data with German characters (Kopfstütze, Eßtisch, …) hits a sparse-matrix
# invariant violation when arules constructs its internal indexing.
Sys.setlocale("LC_COLLATE", "C")

[1] "C"

Code

# Defensive: drop missing / empty article names before grouping. Real data
# typically has a long tail of unparseable rows (returns, miscoded items, ...);
# arules can't construct its sparse matrix if any items are NA.
clean <- raw[!is.na(raw$article_name) & nchar(trimws(raw$article_name)) > 0, ]
clean$article_name <- trimws(clean$article_name)

baskets <- split(clean$article_name, clean$transaction_id)
baskets <- lapply(baskets, function(b) sort(unique(b)))  # sort + dedupe
trans   <- as(baskets, "transactions")
summary(trans)

transactions as itemMatrix in sparse format with
 3515 rows (elements/itemsets/transactions) and
 40 columns (items) and a density of 0.0454623 

most frequent items:
dining_chair     mattress         sofa coffee_table          bed      (Other) 
         580          398          337          322          285         4470 

element (itemset/transaction) length distribution:
sizes
   1    2    3    4    5    6 
1861  844  509  206   78   17 

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.000   1.000   1.000   1.818   2.000   6.000 

includes extended item information - examples:
     labels
1  armchair
2 bar_stool
3       bed

includes extended transaction information - examples:
  transactionID
1        T00001
2        T00002
3        T00003

Item frequency

The 15 most-purchased products:

Code

itemFrequencyPlot(trans, topN = 15, type = "absolute",
                  col = "steelblue", main = "")

Figure 1: Top 15 articles by basket appearance. On real data the top items are often *generic* names (‘esstisch’, ‘stuhl’) rather than model-specific labels — see the data-quality note below.

The leading items reflect what the catalog stocks at scale — typically chairs (bought in sets of 2/4/6 around tables), table-extension components, and core upholstered/storage items.

Data-quality note for the real-data render: many of the dominant items in the chart are generic names — esstisch, stuhl, eckgarnitur — rather than model-specific labels (esstisch ca 180 x 90 cm, esstisch cartago). That’s how the source system was filled in: salespeople often entered the basic Artikelbezeichnung and put model + size info into separate Modell / Artikelnummer fields. So the apparent concentration on a handful of generic items reflects how the catalogue was recorded, not how customers actually shopped — many distinct Cartago / Olivia / etc. variants sit underneath each generic label. Co-purchase rules at this level are still meaningful (chairs do go with tables, regardless of model), but model-level cross-sell (“Cartago-Esstisch zieht Cartago-Stuhl”) is not what this view surfaces. For that you’d need to either pre-concatenate article_name + model (the family_model granularity from docs/GRANULARITY_ANALYSIS.md, ~5 300 items) or work at SKU level (article_id, ~3 500 items, used by chapter 08).

Mining rules

Code

rules <- apriori(trans,
                 parameter = list(support = 0.001, confidence = 0.5),
                 control   = list(verbose = FALSE))
cat("Total rules found:", length(rules), "\n")

Total rules found: 214

Apriori returns hundreds of rules at this threshold. Sorting by confidence puts the most reliable ones first — but the very top is dominated by complex rules of the form {A, B, C, D} ⇒ E. These are technically high-confidence (often 100%) but cover so few baskets that they’re noisy.

A second source of clutter is subsumption: many rules are redundant restatements of stronger rules. arules::is.redundant() is the canonical filter — it removes a rule whenever a more general rule (a strict subset of its antecedent leading to the same consequent) has equal-or-better confidence. Standard Apriori-postprocessing since the early 2000s.

Code

non_redundant <- rules[!is.redundant(rules)]
cat("After arules::is.redundant() filter:",
    length(non_redundant), "of", length(rules), "rules remain\n")

After arules::is.redundant() filter: 156 of 214 rules remain

The headline patterns are the simple rules: one item on the left, one on the right.

Code

simple <- subset(non_redundant, size(non_redundant) == 2)
simple_sorted <- sort(simple, by = "confidence", decreasing = TRUE)
inspect(head(simple_sorted, 10))

    lhs                rhs             support    confidence coverage  
[1] {dining_table}  => {dining_chair}  0.06230441 0.8081181  0.07709815
[2] {kitchen_table} => {kitchen_chair} 0.02873400 0.7372263  0.03897582
[3] {bed}           => {mattress}      0.05604552 0.6912281  0.08108108
[4] {garden_table}  => {garden_chair}  0.01365576 0.6666667  0.02048364
[5] {headboard}     => {bed}           0.02702703 0.6089744  0.04438122
[6] {sideboard}     => {dining_table}  0.02105263 0.6016260  0.03499289
[7] {desk}          => {office_chair}  0.02759602 0.5914634  0.04665718
[8] {sideboard}     => {dining_chair}  0.01849218 0.5284553  0.03499289
    lift      count
[1]  4.897474 219  
[2]  9.492126 101  
[3]  6.104690 197  
[4] 14.202020  48  
[5]  7.510684  95  
[6]  7.803378  74  
[7]  8.315976  97  
[8]  3.202621  65

These are all simple rules sorted by confidence — including the definitional ones that aren’t really insights. The Python section below adds richer interest measures (lift, conviction, leverage), substring-relation annotation, and three separate views on the same rule set — bundle composition, cross-sell, and top-insights — which is what stakeholders actually consume.

What about the complex rules?

Multi-item rules can hit 100% confidence because they’re highly specific — {bed, mattress, table_extension} ⇒ dining_table triggers only on the rare baskets where someone happens to buy bedroom and dining furniture together. They’re not noise, but they’re more interesting as patterns to investigate than as rules to act on.

Visualizing rules

Code

library(arulesViz)
plot(rules, method = "scatterplot",
     measure = c("support", "confidence"), shading = "lift",
     engine = "ggplot2")

Figure 2: Rule landscape: each point is one rule. Top-right corner = high support **and** high confidence (the most actionable rules).

Python — `mlxtend`

The same analysis with the standard Python stack:

Code

import pandas as pd
# Pandas display: render full DataFrame width in chapter outputs.
pd.options.display.max_columns = None
pd.options.display.width = 200

from mlxtend.preprocessing import TransactionEncoder
from mlxtend.frequent_patterns import apriori, association_rules

from pathlib import Path
_data_path = "data/raw/transactions.csv" if Path("data/raw/transactions.csv").exists() else "data/synthetic/transactions.csv"
df = pd.read_csv(_data_path, sep=";")
# Drop rows whose article_name didn't survive the family-mapping (NaN or empty
# string). Mixing NaN floats with string items would crash apriori's internal
# sort with TypeError on real data.
df = df[df["article_name"].notna() & (df["article_name"].astype(str).str.strip() != "")].copy()
df["article_name"] = df["article_name"].astype(str).str.strip()
baskets = df.groupby("transaction_id")["article_name"].apply(lambda s: list(set(s))).tolist()

te = TransactionEncoder()
basket_matrix = pd.DataFrame(te.fit_transform(baskets), columns=te.columns_)

freq_items = apriori(basket_matrix, min_support=0.001, use_colnames=True)
rules = association_rules(
    freq_items, num_itemsets=len(basket_matrix),
    metric="confidence", min_threshold=0.5,
)
print(f"Total rules found: {len(rules)}")

Total rules found: 234

Annotating each rule for downstream views

Different stakeholders want different cuts of the same Apriori output. Cross-sell teams want pairs across product systems. Bundle-/set-merchandising wants pairs within the same product system. A management report wants the statistically strongest patterns regardless of category. Instead of one “actionable” filter we annotate every rule with the signals each view needs, then pick.

Signals computed:

Reverse confidence + symmetry score — min(conf(A→B), conf(B→A)) / max(...). Symmetric pairs (≥ 0.7) are “you can’t really have one without the other” (bett ↔︎ matratze). Asymmetric pairs can be real cross-sell, but on real-data they often turn out to be rare-component → main-product (e.g. topper → boxspringbett) — so asymmetry alone isn’t enough.
Within-bundle flag — bundle_group of antecedent equals consequent’s. Within-bundle = same product system.
Substring relation — antecedent name is a substring of consequent (or vice-versa). Catches definitional component-vs-whole pairs like auszug and auszugselement that bundle-tagging alone misses on real-data variants.
Component-token flag — antecedent contains a generic component/accessory token (aufpreis, ablage, kissen, topper, aufsatz, element, …). These items exist only as paired components of a main product — surfacing them as “cross-sell” would mislead. Curated list, real-data-derived.
Conviction + leverage — additional interest metrics already provided by mlxtend.frequent_patterns.association_rules. Lift > 2 is the baseline for “non-trivial”.

Code

is_simple = (rules["antecedents"].apply(len) == 1) & (rules["consequents"].apply(len) == 1)
simple = (
    rules.loc[is_simple]
    .assign(
        antecedent=lambda d: d["antecedents"].apply(lambda s: next(iter(s))),
        consequent=lambda d: d["consequents"].apply(lambda s: next(iter(s))),
    )
    [["antecedent", "consequent", "support", "confidence", "lift",
      "leverage", "conviction"]]
    .reset_index(drop=True)
)

bundle_lookup = (
    df.drop_duplicates("article_name").set_index("article_name")["bundle_group"]
      .fillna("").to_dict()
)
basket_sets = df.groupby("transaction_id")["article_name"].apply(set).tolist()

def cond_prob(a, b):
    """P(a | b) — fraction of baskets containing b that also contain a."""
    n_b = sum(1 for s in basket_sets if b in s)
    if n_b == 0:
        return 0.0
    n_both = sum(1 for s in basket_sets if a in s and b in s)
    return n_both / n_b

simple["reverse_conf"] = simple.apply(
    lambda r: cond_prob(r["antecedent"], r["consequent"]), axis=1
)
simple["symmetry"] = simple.apply(
    lambda r: min(r["confidence"], r["reverse_conf"]) / max(r["confidence"], r["reverse_conf"])
        if max(r["confidence"], r["reverse_conf"]) > 0 else 0.0,
    axis=1,
)
simple["within_bundle"] = simple.apply(
    lambda r: bundle_lookup.get(r["antecedent"], "") == bundle_lookup.get(r["consequent"], "")
              and bundle_lookup.get(r["antecedent"], "") != "",
    axis=1,
)
simple["substring_pair"] = simple.apply(
    lambda r: r["antecedent"] in r["consequent"] or r["consequent"] in r["antecedent"],
    axis=1,
)

# Component / accessory tokens — curated from real-data inspection. An item
# whose name contains one of these tokens is a paired component of some main
# product, not a stand-alone item. Cross-sell rules where the antecedent is
# such a component (e.g. "topper -> boxspringbett") are definitional, not
# behavioural, and get filtered from the cross-sell view.
COMPONENT_TOKENS = {
    "aufpreis", "ablage", "aufsatz", "schublade", "schubkasten", "schubladenmodul",
    "kissen", "steckkissen", "ruckenkissen", "armlehnkissen", "armlehnenkissen",
    "nierenkissen", "klemmkissen",
    "topper", "matratze",  # mattress is a component of bed_system
    "element", "elementaufnahme", "polsterelement",
    "auszug", "auszugselement", "ansteckplatte", "tischverlangerung",
    "fusshohe", "fussteil", "sitzauszug", "querschlafer",
    "beleuchtungsset", "beleuchtung",
    "panel", "boden", "einlegeboden",
    "hakenleiste", "knopf",
    "rollcontainer",
    "sockel", "verlangerung",
    "kopfteil", "kopfstutze",
    "lattenrost",
    "ruckenelement",
    "ersatz", "ersatzteil", "zubehor", "zubehoer", "nachbestellung",
    "deko",  # "deko element" -> bed
}

def has_component_token(name):
    return bool(set(name.split()) & COMPONENT_TOKENS)

simple["component_antecedent"] = simple["antecedent"].apply(has_component_token)
simple["component_consequent"] = simple["consequent"].apply(has_component_token)
simple["component_either"] = simple["component_antecedent"] | simple["component_consequent"]

print(f"Total simple rules:          {len(simple)}")

Total simple rules:          8

Code

print(f"  within-bundle:             {int(simple['within_bundle'].sum())}")

  within-bundle:             8

Code

print(f"  substring-pair:            {int(simple['substring_pair'].sum())}")

  substring-pair:            0

Code

print(f"  component on either side:  {int(simple['component_either'].sum())}")

  component on either side:  0

Code

print(f"  symmetric (≥ 0.7):         {int((simple['symmetry'] >= 0.7).sum())}")

  symmetric (≥ 0.7):         1

Code

print(f"  lift ≥ 2 (non-trivial):    {int((simple['lift'] >= 2).sum())}")

  lift ≥ 2 (non-trivial):    8

View 1 — Bundle composition

“Which items belong to the same product system?” — useful for set-merchandising, “complete your set” UI, and inventory of complementary parts. We surface within-bundle, substring-related, or component-antecedent pairs, sorted by symmetry × lift so the strongest semantic pairings come out on top.

Code

bundle_view = (
    simple[
        (simple["within_bundle"])
        | (simple["substring_pair"])
        | (simple["component_either"])
    ]
    .assign(score=lambda d: d["symmetry"] * d["lift"])
    .sort_values("score", ascending=False)
)
bundle_view.head(15)[["antecedent", "consequent", "support", "confidence",
                      "reverse_conf", "symmetry", "lift",
                      "within_bundle", "substring_pair", "component_either"]]\
    .round({"support": 4, "confidence": 3, "reverse_conf": 3,
            "symmetry": 2, "lift": 2})

      antecedent     consequent  support  confidence  reverse_conf  symmetry   lift  within_bundle  substring_pair  component_either
6   garden_table   garden_chair   0.0137       0.667         0.291      0.44  14.20           True           False             False
2           desk   office_chair   0.0276       0.591         0.388      0.66   8.32           True           False             False
7  kitchen_table  kitchen_chair   0.0287       0.737         0.370      0.50   9.49           True           False             False
1            bed       mattress   0.0560       0.691         0.495      0.72   6.10           True           False             False
0      headboard            bed   0.0270       0.609         0.333      0.55   7.51           True           False             False
5      sideboard   dining_table   0.0211       0.602         0.273      0.45   7.80           True           False             False
3   dining_table   dining_chair   0.0623       0.808         0.378      0.47   4.90           True           False             False
4      sideboard   dining_chair   0.0185       0.528         0.112      0.21   3.20           True           False             False

View 2 — Cross-sell

“Which items pull in another item from a different system?” — the deployable cross-sell-trigger list. Filter: cross-bundle (or no-bundle), no substring relation, antecedent is not a generic component/accessory token, asymmetric (symmetry < 0.7), lift ≥ 2.

Code

cross_sell = (
    simple[
        ~simple["within_bundle"]
        & ~simple["substring_pair"]
        & ~simple["component_either"]
        & (simple["symmetry"] < 0.7)
        & (simple["lift"] >= 2)
    ]
    .sort_values("confidence", ascending=False)
)
cross_sell.head(15)[["antecedent", "consequent", "support", "confidence",
                     "lift", "leverage", "conviction"]]\
    .round({"support": 4, "confidence": 3, "lift": 2,
            "leverage": 4, "conviction": 2})

Empty DataFrame
Columns: [antecedent, consequent, support, confidence, lift, leverage, conviction]
Index: []

These are the rules to wire into “you might also need” prompts at checkout / page-view / POS scan — antecedent demand pulls in a paired product from a different category.

View 3 — Top insights, stratified by class

“What are the strongest patterns in the data, period?” — for stakeholder reports. A single global “top-N by lift” list isn’t useful: definitional pairs (component → main, substring) have extreme lifts (often 100×+) and would dominate the head of the list, hiding cross-sell and within-bundle signal. We stratify instead — top-5 per class sorted by lift — so the reader sees the strongest representative of each relationship type.

Code

def classify(r):
    if r["component_either"]:
        return "component ↔ main (definitional)"
    if r["substring_pair"]:
        return "substring (definitional)"
    if r["within_bundle"]:
        if r["symmetry"] >= 0.7:
            return "within-bundle (symmetric)"
        return "within-bundle (asymmetric)"
    if r["symmetry"] >= 0.7:
        return "symmetric (cross-bundle)"
    return "cross-sell (asymmetric)"

simple["class"] = simple.apply(classify, axis=1)
top_per_class = (
    simple.sort_values("lift", ascending=False)
          .groupby("class").head(5)
          .sort_values(["class", "lift"], ascending=[True, False])
          .reset_index(drop=True)
)
top_per_class[["class", "antecedent", "consequent", "support", "confidence",
               "lift", "conviction"]]\
    .round({"support": 4, "confidence": 3, "lift": 2, "conviction": 2})

                        class     antecedent     consequent  support  confidence   lift  conviction
0  within-bundle (asymmetric)   garden_table   garden_chair   0.0137       0.667  14.20        2.86
1  within-bundle (asymmetric)  kitchen_table  kitchen_chair   0.0287       0.737   9.49        3.51
2  within-bundle (asymmetric)           desk   office_chair   0.0276       0.591   8.32        2.27
3  within-bundle (asymmetric)      sideboard   dining_table   0.0211       0.602   7.80        2.32
4  within-bundle (asymmetric)      headboard            bed   0.0270       0.609   7.51        2.35
5   within-bundle (symmetric)            bed       mattress   0.0560       0.691   6.10        2.87

Code

print("Distribution of all simple rules across classes:")

Distribution of all simple rules across classes:

Code

for cls, n in simple["class"].value_counts().items():
    print(f"  {cls:35s} {n}")

  within-bundle (asymmetric)          7
  within-bundle (symmetric)           1

The class distribution is itself diagnostic: a healthy retail dataset has a substantial cross-sell tail (true behavioural signal), but also legitimate bundle/substring pairs (catalog plumbing). The three views above let you pick the right cut for the question at hand instead of forcing one filter to fit all three jobs.

What each view means operationally

The methods above produce three rule lists. Each answers a different business question and feeds a different operational system. Here is what to do with each:

View 1 — Bundle composition

Question answered: “Which items are part of the same product system?”

What you read off it: rules where antecedent + consequent are functional companions in one decision — bed + mattress + frame, sofa + matching cushion + element pieces, dining-table + chairs + extension. Lift here is large because the relationship is structural, not behavioural.

Operational use cases: - Set-bundling: package the strongest within-system pairs as ready-made bundles (“Bett + Lattenrost + Matratze als Set”). The high-lift pairs are the obvious bundle candidates. - “Complete your set” prompts: at checkout, if the basket has only the primary item from a set, surface the missing components. - Inventory coupling: when stock of a primary item runs low, check the paired components — a stockout on the main product reduces demand for components, and vice-versa. - Catalog hygiene: rules with extreme symmetry (≥ 0.9) are products you should consider merging into a single SKU or bundle, since customers don’t really choose between them.

View 2 — Cross-sell

Question answered: “Which item from one product system pulls in an item from another?”

What you read off it: rules where the antecedent and consequent live in different bundles, are not substring-related, and the antecedent is not a generic component token. These are real cross-sell triggers: the antecedent is a stand-alone product whose buyer also pulls a different stand-alone product.

Operational use cases: - Page-view recommendations: when a customer views the antecedent product, surface the consequent in “you might also like” — but only the cross-sell view, not the bundle view, otherwise the suggestion looks redundant. - POS / checkout prompts: cashier sees “TV-Schrank im Warenkorb → Highboard im Angebot anbieten”. - Targeted email: 14–30 days after antecedent purchase, send a follow-up offering the consequent at a small discount. - Floor placement: place cross-sell pairs visually adjacent in the showroom (Sessel-Hocker-Kombination im selben Sichtbereich). - Cross-sell incentive plan: if you compensate sales staff per cross-sell, this is the list of pairs that count — not the bundle pairs (those are baseline expectation, not extra effort).

View 3 — Top insights (statistical strength)

Question answered: “What are the strongest co-purchase patterns in the dataset, regardless of class?”

What you read off it: rules sorted by lift, each annotated with its class — component ↔︎ main, substring, within-bundle (sym/asym), symmetric (cross-bundle), cross-sell (asymmetric). The reader sees the strongest signal first, the class column tells them whether to act on it (cross-sell), package it (bundle), or note it as catalog structure (component / substring).

Operational use cases: - Stakeholder report / management presentation: “Diese 20 Pairs erklären den Großteil der Co-Purchase-Struktur im Sortiment, hier sortiert nach Stärke und klassifiziert.” - Methodology audit: an analyst auditing the cross-sell list (View 2) can use View 3 as a sanity check — if a high-lift rule is missing from View 2, it’s because the class column says it’s bundle/substring/component, which is correct triage. - Catalog documentation: the View 3 output is a snapshot of which products are tightly coupled in customer behaviour, useful for onboarding new merchandisers.

What this chapter does not answer

Apriori on baskets only tells us what is bought together. It says nothing about:

Who the buyers are (chapter 04 CLV, 06 Survival, 09 Uplift)
Which products dominate revenue or are growing (chapter 02 BCG, 03 RFM)
What demand looks like next quarter (chapter 07 Forecasting)
Which products are functionally substitutable when one is out of stock (chapter 08 Embeddings)

The full operational picture — which products to push to which customers, when, with which offer — comes from joining the View 2 cross-sell rules with the customer-side analyses (CLV decile, win-back zone, hazard ratios) and the product-side rankings (BCG quadrants, RFM tier). That synthesis lives in chapter 05 Insights.

Code

import matplotlib.pyplot as plt
import seaborn as sns

# Stable class order so the legend reads top-to-bottom from "actionable" to
# "definitional plumbing".
class_order = [
    "cross-sell (asymmetric)",
    "symmetric (cross-bundle)",
    "within-bundle (asymmetric)",
    "within-bundle (symmetric)",
    "substring (definitional)",
    "component ↔ main (definitional)",
]
class_palette = {
    "cross-sell (asymmetric)":          "#27ae60",
    "symmetric (cross-bundle)":         "#3498db",
    "within-bundle (asymmetric)":       "#9b59b6",
    "within-bundle (symmetric)":        "#7f8c8d",
    "substring (definitional)":         "#f39c12",
    "component ↔ main (definitional)":  "#c0392b",
}

fig, ax = plt.subplots(figsize=(9, 5.5))
sns.scatterplot(
    data=simple, x="support", y="confidence",
    hue="class", hue_order=class_order, palette=class_palette,
    size="lift", sizes=(20, 220), alpha=0.8, ax=ax,
)
ax.set_xscale("log")
ax.set_xlabel("Support (log scale)")
ax.set_ylabel("Confidence")
ax.legend(bbox_to_anchor=(1.02, 1), loc="upper left", fontsize=8, frameon=False)
plt.tight_layout()
plt.show()

Figure 3: Simple rules in support × confidence space, classified. Cross-sell rules (the actionable subset) sit visually apart from within-bundle and component-pair plumbing — the same Apriori output, shown with the triage class as colour.

R vs. Python — same answers?

Both implementations find the same total rule count and recover the same simple-rule headline patterns at identical support / confidence values — the algorithm is deterministic, not the implementation. The exact ordering of complex rules can differ when many tie on confidence (1.0), but that’s a tiebreaker artifact, not a difference in what the algorithms find.

Implementation differences worth knowing

	`arules` (R)	`mlxtend` (Python)
Internal representation	sparse C-level transaction matrix	one-hot DataFrame
Performance on large data	very fast (10× ahead of mlxtend on 100k+ baskets)	acceptable up to ~10k baskets
Visualization	rich (`arulesViz`: scatter, graph, parallel coordinates, matrix)	DIY with matplotlib / networkx
Subsetting / pruning rules	first-class (`subset(rules, ...)`)	pandas filtering on the rules frame
Ecosystem fit	preferred when downstream is R / shiny	preferred when downstream is FastAPI / serving

For exploratory work and visualization the R toolkit wins; for embedding into a Python service mlxtend is the easier deploy.

Takeaway

Apriori finds raw co-purchase patterns; the work that turns those into decisions is in the postprocessing — redundancy filtering (is.redundant), interest measures beyond confidence (lift, conviction, leverage), and splitting the output into the views that map onto actual stakeholder questions: bundle composition, cross-sell triggers, and statistically strongest patterns. The same Apriori-run answers all three; the difference is in which rules each view surfaces, not in re-running the algorithm with different thresholds.

Data

R — arules

Build the transaction object

Item frequency

Mining rules

What about the complex rules?

Visualizing rules

Python — mlxtend

Annotating each rule for downstream views

View 1 — Bundle composition

View 2 — Cross-sell

View 3 — Top insights, stratified by class

What each view means operationally

View 1 — Bundle composition

View 2 — Cross-sell

View 3 — Top insights (statistical strength)

What this chapter does not answer

R vs. Python — same answers?

Implementation differences worth knowing

Takeaway

R — `arules`

Python — `mlxtend`