From 49948a50afdc29a25d07b0e7edfc5b02862cc583 Mon Sep 17 00:00:00 2001
From: atiwary <atiwary@marqeta.com>
Date: Wed, 20 May 2026 16:48:30 -0500
Subject: [PATCH 1/2] fix: [DATA-123] correct order_fact revenue computation

Root cause: order_fact had two independent bugs that combined to
understate non-test Q1 revenue by $2,467,627.97 vs Stripe captures
($10,522,258.04 reported vs $12,989,886.01 truth).

1. The QUALIFY row_number() PARTITION BY order_id ORDER BY shipped_at = 1
   kept only the first shipment per order and silently dropped revenue
   from 2nd+ shipments on 847 multi-shipment orders (-$747,890.76).
2. Revenue was sum(quantity_shipped * unit_price) at shipment-line
   grain. Partially-shipped, partially-cancelled, and not-yet-shipped
   orders lost the un-shipped portion (-$1,719,737.21).

Fix: aggregate stg_line_items at order grain first (revenue =
sum(quantity * unit_price), no shipment fan-out, no QUALIFY needed),
attach shipment metadata as a sidecar (shipment_count, first_shipped_at).
Also fixes the incremental watermark column mismatch (filter and
watermark now both on ordered_at instead of ordered_at vs updated_at_dwh).

Validated: make full + make test both clean (17/17 + 12/12). Non-test
sum(revenue) ties to $12,989,886.01 to the penny; rows == distinct
order_ids == 9699 confirms one-row-per-order grain.

Also adds:
- CLAUDE.md documenting repo layering, incremental pattern, and the
  prior bug zones so future work doesn't re-discover them.
- docs/designs/2026-Q2-refunds-modeling.md: draft design for the
  refunds/net-revenue follow-on (DATA-145 territory), out-of-scope
  for this fix per the Q3-2024 design doc.
---
 CLAUDE.md                                |  80 +++++++++
 docs/designs/2026-Q2-refunds-modeling.md | 205 +++++++++++++++++++++++
 models/orders/dw/order_fact.sql          | 119 ++++---------
 3 files changed, 315 insertions(+), 89 deletions(-)
 create mode 100644 CLAUDE.md
 create mode 100644 docs/designs/2026-Q2-refunds-modeling.md

diff --git a/CLAUDE.md b/CLAUDE.md
new file mode 100644
index 0000000..ddbbb44
--- /dev/null
+++ b/CLAUDE.md
@@ -0,0 +1,80 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What this repo is
+
+A Principal Data Engineer interview exercise: a small dbt + DuckDB warehouse inherited from a contractor. The active task is in `DATA-123.md` (Q1 revenue reconciliation against Stripe captures is off). Read it first.
+
+`DATA-123.md` is **rendered from `DATA-123.md.tmpl` by `setup/generate.py`** — re-running `make setup` overwrites it. Don't edit it directly if you want notes to persist; put notes elsewhere.
+
+## Commands
+
+All commands go through the Makefile, which wraps `uv run`. Don't invoke `dbt` directly unless you have a reason — the Make targets pin the right working directory and env.
+
+```bash
+make setup      # rebuild warehouse.duckdb from raw/*.csv, then dbt full-refresh
+make seed       # regenerate raw/*.csv from setup/generate.py, then full-refresh
+make run        # dbt run (incremental)
+make full       # dbt run --full-refresh
+make test       # dbt test (singular tests in tests/ + schema tests in *.yml)
+make lint       # sqlfluff lint models/
+make sql Q="select count(*) from main_orders_dw.order_fact"   # read-only one-shot query
+make clean      # rm warehouse.duckdb (next `make setup` rebuilds)
+```
+
+Run a single dbt model or test:
+
+```bash
+uv run dbt run --select order_fact
+uv run dbt run --select +order_fact          # model + upstream
+uv run dbt test --select order_fact          # all tests on a model
+uv run dbt test --select test_name:order_fact_revenue_reconciliation
+```
+
+`profiles.yml` lives in the repo root (not `~/.dbt/`); dbt picks it up via project-local config. Target is `dbt_duckdb` → `./warehouse.duckdb`.
+
+## Architecture
+
+### Layering (enforced by convention, not tooling)
+
+`base/` → `staging/` → (`lookup/` | `dw/`) → `reporting/`
+
+Materializations are set in `dbt_project.yml` per layer:
+- `base/`, `staging/` → views
+- `lookup/` → tables
+- `dw/` → **incremental** (the only persisted heavy layer)
+- `reporting/` → views
+
+Each subject area (`orders/`, `merchants/`) gets its own schema per layer (e.g. `main_orders_dw`, `main_merchants_lookup`). When writing `make sql Q=...` queries, use the fully-qualified `<schema>.<table>` name.
+
+### Incremental pattern
+
+`models/orders/dw/order_fact.sql` and `order_line_fact.sql` are incremental on `ordered_at`, gated by the `get_incremental_value(incr_col)` macro in `macros/get_incremental_value.sql`. The macro is a DuckDB-flavored shim of Extend's internal macro of the same name — it returns `max(incr_col)` from the existing relation, or `'1900-01-01'` on first build. The incremental `WHERE` clause uses this to filter new rows.
+
+**Watch-out:** `order_fact.sql` currently filters incrementally on `ordered_at >= get_incremental_value('updated_at_dwh')` — note the column mismatch. If you change incremental logic, verify the watermark column matches the filter column.
+
+### Order grain & shipment fan-out
+
+`order_fact` is **one row per order**, but the underlying join goes through shipments → shipment_line_items → line_items, which fans out. The model deduplicates with `QUALIFY row_number() OVER (PARTITION BY order_id ORDER BY shipped_at) = 1` at the bottom. Any change touching the join keys or grain needs to keep that invariant.
+
+`revenue` on `order_fact` is computed as `sum(quantity_shipped * unit_price)` aggregated to `(order, shipment)` then collapsed to first shipment by `shipped_at`. This is the suspected source of the DATA-123 discrepancy — orders with multiple shipments will lose revenue from non-first shipments.
+
+### Out-of-scope by prior design
+
+Per `docs/designs/2024-Q3-orders-redesign.md`:
+- **Refunds are intentionally not modeled.** `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`.
+- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.
+
+Read the design doc before proposing structural changes — its "out of scope" section reflects deliberate decisions, not gaps.
+
+## Conventions (from CONTRIBUTING.md)
+
+- SQL: 4-space indent, **leading commas**, lowercase identifiers, uppercase keywords. `make lint` enforces this via sqlfluff (config in `.sqlfluff`, DuckDB dialect, dbt templater).
+- Every fact gets `unique` + `not_null` on its PK.
+- Prefer row-level invariants over aggregate reconciliation tests — aggregates tell you something is wrong, not where.
+- `severity: warn` is for noisy upstream conditions, **not** for known-broken state. The singular test `tests/order_fact_revenue_reconciliation.sql` is currently `severity='warn'` — that's a smell to investigate, not a precedent to copy.
+
+## Seed data
+
+`raw/*.csv` is **committed**. `setup/generate.py` is deterministic (`SEED = 20260517`) so regenerating produces identical files. Date range is 2024-11-01 → 2026-05-01 (18 months); 5k merchants, 500 products, 10k orders. Refund CSVs exist in `raw/` but are unused (see above).
diff --git a/docs/designs/2026-Q2-refunds-modeling.md b/docs/designs/2026-Q2-refunds-modeling.md
new file mode 100644
index 0000000..9389ab8
--- /dev/null
+++ b/docs/designs/2026-Q2-refunds-modeling.md
@@ -0,0 +1,205 @@
+# Refunds modeling — net revenue for `order_fact` / `order_line_fact`
+
+**Author:** Data Engineering
+**Status:** Draft — for review with Finance Analytics
+**Date:** 2026-05-20
+**Related:** [`2024-Q3-orders-redesign.md`](2024-Q3-orders-redesign.md) (out-of-scope item: refunds), DATA-145 (refund sources, not yet scoped), DATA-123 (Q1 revenue reconciliation — gross-revenue bug, separate issue)
+
+---
+
+## 1. What Finance asked for
+
+> Surface refund totals on `order_fact` and per-line refund amounts on `order_line_fact` so we can report **net revenue** (gross − refunds).
+
+This document captures the modeling decisions we recommend. We want sign-off on the decisions in §3 and §5 before we touch any SQL.
+
+### Two items to highlight up front
+
+These are covered in detail below, but they're the two decisions most likely to matter to Finance, so we want them on the table before the walkthrough:
+
+1. **Shopify/Stripe dedup is a heuristic (§3.3).** The same logical refund appears in both Shopify and Stripe in our sample data — Shopify carries the line + amount, Stripe carries the tender split. Summing them double-counts. We propose matching on `(order_id, refunded_at_minute)` and treating Shopify as the source of truth for occurrence/amount. **If Finance has a real `gateway_refund_id` cross-walk available in production, we should use that instead** — it's exact, the heuristic is not. See open question E.
+
+2. **Late-arriving refunds are the operational gotcha (§4).** `order_fact` today filters incremental loads on `ordered_at` only. A refund processed today against an order from three months ago would not refresh that order's row — its `net_revenue` would silently stay wrong. The fix is an `OR` clause on "orders with new refund activity since last run." **This directly affects the question "when do my numbers reflect today's refunds?"** — incremental runs will pick up new refunds against historical orders within one cycle, not never.
+
+## 2. What's in raw today
+
+Three refund feeds are landed in `raw/` but **not wired into any model**:
+
+| Source                    | Grain          | Key fields                                                        | Notes                                                                |
+|---------------------------|----------------|-------------------------------------------------------------------|----------------------------------------------------------------------|
+| `refunds_stripe.csv`      | order × tender | `refund_id`, `order_id`, `tender_type`, `amount_in_cents`, `processed_at` | Multiple rows per refund event when a refund is split across tenders (e.g., card + store_credit). Per-line detail not available. |
+| `refunds_shopify.csv`     | order × line   | `refund_id`, `order_id`, `line_item_id`, `qty_refunded`, `amount_in_cents`, `refunded_at` | Has line-level detail and quantity refunded. No tender breakdown.    |
+| `refunds_internal_pos.csv`| order          | `refund_id`, `order_id`, `amount_in_cents`, `refunded_at`         | In-store register; standalone, no Shopify/Stripe pairing.            |
+
+**Observation (important):** the same refund event sometimes appears in **both** Stripe and Shopify. Example in current sample data, order `O005064` at `2026-04-18T23:12:12`:
+
+- Shopify: 1 row, `178,905¢` against `L0009590`, `qty_refunded = 5`.
+- Stripe: 2 rows totaling `178,905¢` (`89,452¢` card + `89,453¢` store_credit).
+
+These describe the same money moving — Shopify is the system of record for **what was refunded** (line, quantity), Stripe is the system of record for **how it was paid back** (tender). Summing across sources would double-count. Our model must pick a precedence rule (see §3).
+
+POS appears to be standalone (separate register, no e-commerce pairing). Treat it as additive.
+
+## 3. Modeling decisions
+
+### 3.1 Build a separate `refund_fact` (one row per refund event)
+
+We keep the Q3-2024 design principle intact: **don't tangle refund logic with order status/revenue logic on `order_fact`**. Refund details live on their own fact; `order_fact` and `order_line_fact` carry **aggregates** of that fact.
+
+```
+base/  refunds_stripe, refunds_shopify, refunds_internal_pos
+staging/  stg_refunds      ← unified, deduped, line-aware where possible
+dw/       refund_fact      ← one row per logical refund × line
+          order_fact       ← gains refund_total, refund_count, last_refunded_at, net_revenue
+          order_line_fact  ← gains refund_amount, qty_refunded
+reporting/  rpt_net_revenue (or similar, per Finance preference)
+```
+
+### 3.2 Grain of `refund_fact`: **order × line × refund_event**
+
+One row per (order_id, line_item_id, refund_id). For order-grain sources (POS, Stripe-only), `line_item_id` is `NULL` and the amount is carried on a single row per refund_id. The line-grain `refund_amount` on `order_line_fact` then uses allocation (§3.4) to fill in unallocated amounts.
+
+This is more granular than strictly necessary for the finance asks, but it's the grain that lets us answer:
+
+- "What was refunded?" (line + quantity) — from Shopify directly.
+- "How was it paid back?" (tender) — joinable to Stripe rows by (order_id, refunded_at).
+- "Was this in-store or online?" — derivable from `source`.
+
+### 3.3 Source precedence to dedupe Shopify ↔ Stripe overlap
+
+We treat **Shopify as the system of record for refund occurrence + line + amount**, and **Stripe as the system of record for tender mix only**. Concretely, in `stg_refunds`:
+
+1. Start with all Shopify rows (line-grain).
+2. Add Internal POS rows (line_item_id NULL).
+3. Add Stripe rows **only if `(order_id, refunded_at_minute)` does not appear in Shopify** — these are Stripe-direct refunds (e.g., issued from the Stripe dashboard, no Shopify counterpart).
+4. Tender breakdown becomes a sidecar: `stg_refund_tenders` (order_id, refund_event_key, tender_type, amount) for downstream reporting that needs it. Not joined into `refund_fact`.
+
+This is a heuristic. We'd like Finance to confirm whether they have a stronger key from upstream (e.g., a `gateway_refund_id` cross-walk) — see §8.
+
+### 3.4 Allocation of order-grain refunds to lines
+
+For refunds where `line_item_id IS NULL` (POS, Stripe-direct), we cannot know which line was refunded. To populate `order_line_fact.refund_amount` we allocate **pro-rata by line revenue**:
+
+```
+line_refund_share = line_revenue / order_revenue
+allocated_refund   = order_refund_amount * line_refund_share
+```
+
+We carry an explicit flag `refund_allocation_method ∈ {'direct', 'pro_rata'}` on `order_line_fact` so analysts can see when a line's refund came from a real Shopify attribution vs. a derived split. The sum across lines still ties to `order_fact.refund_total` to the penny (we round-pennies the largest line to absorb rounding drift — same trick as Stripe's `presentment_money` allocation).
+
+**Alternative we considered and rejected:** leave un-allocated. That's cleaner but breaks Finance's "per-line net revenue" report — every POS order would have NULL refund attribution. Pro-rata with a method flag preserves both: rollups stay accurate and analysts can filter to `direct`-only lines when investigating returns by SKU.
+
+### 3.5 What columns go on `order_fact`
+
+| Column                  | Definition                                                              |
+|-------------------------|-------------------------------------------------------------------------|
+| `refund_total`          | `sum(refund_fact.amount)` per order, in dollars (consistent with `revenue`) |
+| `refund_count`          | `count(distinct refund_id)` per order                                   |
+| `last_refunded_at`      | `max(refunded_at)` per order                                            |
+| `net_revenue`           | `revenue - coalesce(refund_total, 0)`                                   |
+| `is_fully_refunded`     | `refund_total >= revenue` (handles goodwill/over-refund as TRUE)        |
+
+Nullability: `refund_total` is `coalesce(..., 0)` — every order has a value, even if zero. This makes downstream `net_revenue` filters and aggregations straightforward (no `IS NULL` traps).
+
+### 3.6 What columns go on `order_line_fact`
+
+| Column                     | Definition                                                  |
+|----------------------------|-------------------------------------------------------------|
+| `qty_refunded`             | From Shopify when available; else `NULL` (we don't fabricate qty for allocated rows) |
+| `refund_amount`            | Direct from Shopify OR pro-rata allocated (see §3.4)        |
+| `refund_allocation_method` | `'direct' \| 'pro_rata' \| 'none'`                          |
+| `net_line_revenue`         | `line_revenue - coalesce(refund_amount, 0)`                 |
+
+## 4. Incremental + late-arriving refunds (the operational gotcha)
+
+This is the single most important production decision. Worth slowing down on.
+
+The current incremental pattern on `order_fact` filters on `ordered_at >= get_incremental_value('ordered_at')`. **Refunds arrive days, weeks, or months after the order.** A refund processed today against an order from three months ago will not be picked up by an `ordered_at`-filtered incremental load — `order_fact` row for that order would never refresh.
+
+We resolve this in two pieces:
+
+1. **`refund_fact` is incremental on `refunded_at`** (the refund event's own date). Standard pattern.
+2. **`order_fact` and `order_line_fact` switch from a single-watermark filter to a union of "new orders" and "orders with new refund activity":**
+
+```sql
+WHERE o.ordered_at >= {{ get_incremental_value('ordered_at') }}
+   OR o.order_id IN (
+       SELECT order_id FROM {{ ref('refund_fact') }}
+       WHERE refunded_at >= {{ get_incremental_value('refunded_at_dwh') }}
+   )
+```
+
+This re-processes any order whose refund picture changed since the last run, without re-processing the entire history. We'll also need an `updated_at_dwh` column on `order_fact` that bumps whenever refund aggregates change — handy for downstream consumers and required for the watermark column we read above.
+
+**Backfill on first deploy:** one full-refresh (`make full`) to pick up historical refunds against all historical orders. Incremental thereafter.
+
+## 5. Scope decisions for sign-off
+
+These are choices we'd like Finance to explicitly confirm or correct:
+
+| # | Question                                                                                | Our recommendation                                                                                                                                                  |
+|---|-----------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| A | Does "net revenue" = `revenue − refund_total`?                                          | **Yes.** Cancellations are already handled by `order_status` (pre-payment voids never hit gross revenue); refunds are the only post-payment reversal we model here. |
+| B | If `refund_total > revenue` (goodwill, overpayment), should `net_revenue` go negative?  | **Yes, surface the truth.** Don't clamp at zero. Add a data-quality test at `severity='warn'` so Finance is notified, not silenced. |
+| C | Should refunds from test orders (`is_test = true`) be excluded?                         | **Yes, same as gross revenue.** Reporting layer applies `is_test = false` consistently.                                                                              |
+| D | Is per-tender breakdown (card vs. store_credit) needed on `order_fact`?                 | **No.** Sidecar `stg_refund_tenders` covers anyone who needs it; keeping `order_fact` narrow.                                                                       |
+| E | Shopify-vs-Stripe dedup rule (§3.3) — does Finance have a better key than `(order_id, minute)`? | **Need input.** If a `gateway_refund_id` mapping exists in source, use it; otherwise we go with the heuristic and add a monitoring test.                            |
+| F | How should refunds for **shipped** vs **not-yet-shipped** lines be treated?             | **Same treatment.** Both reduce net revenue. (Worth noting because some businesses differentiate; we don't think we should.)                                        |
+| G | Tax / shipping refunds — included in `amount_in_cents`?                                 | **Assumed yes (gross-of-tax).** Mirrors how `revenue` is currently computed (`quantity * unit_price`, no tax breakout). If Finance reports net-of-tax separately, that's a follow-up. |
+
+## 6. Tests we'll add
+
+Following CONTRIBUTING.md (row-level invariants > aggregate reconciliation, no `severity='warn'` on known-broken state):
+
+**`refund_fact`**
+- `unique` + `not_null` on `refund_id`.
+- `not_null` on `order_id`, `amount`, `refunded_at`, `source`.
+- `amount > 0` (singular test).
+
+**`order_fact`**
+- Singular row-level: per order, `refund_total = sum(refund_fact.amount where order_id = ...)`.
+- Singular: `net_revenue = revenue - coalesce(refund_total, 0)` (tautology guard against future drift).
+- `severity='warn'` test: count of orders where `refund_total > revenue` (Finance signal, not bug).
+
+**`order_line_fact`**
+- Singular row-level: per order, `sum(order_line_fact.refund_amount) = order_fact.refund_total` (within ±$0.01 to account for pro-rata rounding).
+- `not_null` on `refund_allocation_method`.
+
+**`stg_refunds`**
+- Singular: dedup invariant — no `(order_id, refunded_at_minute)` appears across both Shopify and Stripe in the unified output.
+
+## 7. Reporting / consumer impact
+
+We checked the `reporting/` layer for anything that names `revenue` and would silently change meaning. Current state: nothing in `reporting/` consumes refunds (because they don't exist yet), so we're additive. New reporting models we expect Finance will want:
+
+- `rpt_net_revenue_by_merchant_month`
+- `rpt_refunds_by_reason` (if/when reason codes become available — not in current sources)
+- `rpt_refunds_by_tender` (joins to `stg_refund_tenders`)
+
+We'd rather Finance own those report shapes than guess at column lists. Spec them; we build them.
+
+## 8. Open questions
+
+1. **Cross-source key:** does upstream Shopify carry the Stripe `refund_id` (or vice versa)? If yes, we drop the time-window heuristic in §3.3.
+2. **Reason codes:** none of the three sources carry refund reasons today. Should we ask the upstream loaders to bring them through? Useful for `rpt_refunds_by_reason`.
+3. **Partial-refund cadence:** are multi-event refunds against the same line a real scenario (e.g., partial refund today, second partial next week)? Our model handles it, but it changes test expectations.
+4. **Currency:** present data is single-currency cents. If multi-currency is on the roadmap, we should add `currency` to `refund_fact` now rather than retrofit.
+5. **The DATA-123 reconciliation:** gross revenue ties to Stripe captures of $12,989,886.01 (Q1, non-test) after the recent fix. Once refunds land, Finance's net-revenue number for Q1 will be **lower** by `sum(refund_fact.amount where refunded_at in Q1)`. We should align on what Finance expects that delta to be before deploy, so we have a sanity check.
+
+## 9. Rollout plan (after sign-off)
+
+1. `base/` + `staging/` for the three refund sources, with the dedup logic.
+2. `refund_fact` in `dw/`, incremental on `refunded_at`.
+3. Update `order_fact` + `order_line_fact` per §3.5–§3.6 and §4.
+4. Tests per §6.
+5. Spot-check: total Q1 refunds, top-10 refunded orders, allocation method mix.
+6. Hand to Finance for parallel validation against their current manual net-revenue calc.
+7. Flip the relevant reporting models / dashboards.
+
+Backfill = one `make full`. Daily incremental thereafter.
+
+## 10. What this doc is *not*
+
+- Not a Type-2 history proposal for refund state changes. If a refund gets voided or amended, today we'd see the latest row only. If that becomes a real scenario, it's a follow-up snapshot model — same answer as the Q3-2024 Type-2 question on merchants.
+- Not a chargeback model. Stripe disputes / chargebacks are a different upstream and have a different finance treatment (cost of goods sold side, not gross revenue contra). Out of scope here.
+- Not a fix to gross-revenue computation. That's DATA-123, separate.
diff --git a/models/orders/dw/order_fact.sql b/models/orders/dw/order_fact.sql
index ecd392f..6b7e574 100644
--- a/models/orders/dw/order_fact.sql
+++ b/models/orders/dw/order_fact.sql
@@ -3,108 +3,49 @@
     unique_key='order_id'
 ) }}
 
-WITH shipment_lines AS (
-    SELECT
-        sl.shipment_id
-        , sl.line_item_id
-        , sl.quantity_shipped
-        , li.unit_price
-    FROM {{ ref('stg_shipment_line_items') }} AS sl
-    INNER JOIN {{ ref('stg_line_items') }} AS li
-        ON sl.line_item_id = li.line_item_id
-)
-
-, joined AS (
-    SELECT
-        o.order_id
-        , o.merchant_id
-        , o.customer_id
-        , o.order_status
-        , o.is_test
-        , o.ordered_at
-        , o.paid_at
-        , s.shipment_id
-        , s.shipped_at
-        , sl.line_item_id
-        , sl.quantity_shipped
-        , sl.unit_price
-    FROM {{ ref('stg_orders') }} AS o
-    LEFT JOIN {{ ref('stg_shipments') }} AS s
-        ON o.order_id = s.order_id
-    LEFT JOIN shipment_lines AS sl
-        ON s.shipment_id = sl.shipment_id
-)
-
-, shipment_totals AS (
-    -- aggregated to one row per (order, shipment)
+WITH order_revenue AS (
     SELECT
         order_id
-        , merchant_id
-        , customer_id
-        , order_status
-        , is_test
-        , ordered_at
-        , paid_at
-        , shipment_id
-        , shipped_at
+        , sum(quantity * unit_price) AS revenue
+        , sum(quantity) AS quantity_ordered
         , count(DISTINCT line_item_id) AS line_count
-        , sum(quantity_shipped) AS total_quantity
-        , sum(quantity_shipped * unit_price) AS shipment_revenue
-    FROM joined
-    GROUP BY order_id, merchant_id, customer_id, order_status, is_test, ordered_at, paid_at, shipment_id, shipped_at
+    FROM {{ ref('stg_line_items') }}
+    GROUP BY order_id
 )
 
-, shipment_counts AS (
+, order_shipments AS (
     SELECT
         order_id
         , count(DISTINCT shipment_id) AS shipment_count
-    FROM shipment_totals
+        , min(shipped_at) AS first_shipped_at
+    FROM {{ ref('stg_shipments') }}
     GROUP BY order_id
 )
 
-, enriched AS (
-    SELECT
-        st.order_id
-        , st.merchant_id
-        , m.merchant_name
-        , st.customer_id
-        , m.customer_type
-        , st.order_status
-        , st.is_test
-        , st.ordered_at
-        , st.paid_at
-        , st.shipped_at
-        , sc.shipment_count
-        , st.line_count
-        , st.total_quantity
-        , st.shipment_revenue AS revenue
-    FROM shipment_totals AS st
-    LEFT JOIN {{ ref('lkp_merchants') }} AS m
-        ON st.merchant_id = m.merchant_id
-    LEFT JOIN shipment_counts AS sc
-        ON st.order_id = sc.order_id
-)
-
 SELECT
-    order_id
-    , merchant_id
-    , merchant_name
-    , customer_id
-    , customer_type
-    , order_status
-    , is_test
-    , ordered_at
-    , paid_at
-    , shipped_at
-    , shipment_count
-    , line_count
-    , total_quantity
-    , revenue
+    o.order_id
+    , o.merchant_id
+    , m.merchant_name
+    , o.customer_id
+    , m.customer_type
+    , o.order_status
+    , o.is_test
+    , o.ordered_at
+    , o.paid_at
+    , os.first_shipped_at AS shipped_at
+    , coalesce(os.shipment_count, 0) AS shipment_count
+    , orev.line_count
+    , orev.quantity_ordered AS total_quantity
+    , orev.revenue
     , current_timestamp AS created_at_dwh
     , current_timestamp AS updated_at_dwh
-FROM enriched
+FROM {{ ref('stg_orders') }} AS o
+LEFT JOIN order_revenue AS orev
+    ON o.order_id = orev.order_id
+LEFT JOIN order_shipments AS os
+    ON o.order_id = os.order_id
+LEFT JOIN {{ ref('lkp_merchants') }} AS m
+    ON o.merchant_id = m.merchant_id
 {% if is_incremental() %}
-    WHERE ordered_at >= {{ get_incremental_value('updated_at_dwh') }}
+    WHERE o.ordered_at >= {{ get_incremental_value('ordered_at') }}
 {% endif %}
--- dedupe to one row per order (orders can have multiple shipments)
-QUALIFY row_number() OVER (PARTITION BY order_id ORDER BY shipped_at) = 1

From f02c459cdf9a3025cea857fde1132cc6686f800d Mon Sep 17 00:00:00 2001
From: atiwary <atiwary@marqeta.com>
Date: Wed, 20 May 2026 17:00:05 -0500
Subject: [PATCH 2/2] feat: [DATA-145] add refunds modeling + net_revenue on
 order_fact

Implements the design proposal in docs/designs/2026-Q2-refunds-modeling.md
that was added in the previous DATA-123 commit. Wires the three refund
sources (Stripe, Shopify, Internal POS) into the warehouse, builds a
refund_fact, and surfaces refund_total / net_revenue / is_fully_refunded
on order_fact plus refund_amount + qty_refunded on order_line_fact.

Key decisions (see design doc for full rationale):
- refund_fact grain: order x line x refund_event.
- Shopify-Stripe dedup via (order_id, refunded_at_minute) heuristic;
  Shopify wins on occurrence + line + amount, Stripe is tender sidecar.
- Pro-rata allocation of order-grain refunds (POS, Stripe-direct) to
  lines, with explicit refund_allocation_method flag.
- Late-arriving refund handling: order_fact incremental filter unions
  ordered_at >= watermark with orders touched by refunds since the
  last refund-side watermark.

Tests added (row-level invariants per CONTRIBUTING.md):
- refund_fact: unique/not_null on refund_event_id, amount > 0.
- order_fact: per-order refund_total ties to sum(refund_fact.amount);
  net_revenue tautology guard; warn-only signal when refund > revenue.
- order_line_fact: per-order sum(refund_amount) ties to order_fact
  refund_total within rounding.
- stg_refunds: no (order_id, minute) collision across Shopify+Stripe.

Validated: make full clean (23/23); make test clean (41/41).

Open items pending Finance sign-off per design doc Sec.5 (e.g., gateway
cross-walk vs heuristic, tax/shipping treatment). Tracked there, not
blocking this commit.
---
 .../orders/base/base_refunds_internal_pos.sql |   4 +
 models/orders/base/base_refunds_shopify.sql   |   4 +
 models/orders/base/base_refunds_stripe.sql    |   4 +
 models/orders/base/orders_base.yml            |   3 +
 models/orders/dw/order_fact.sql               |  32 ++++
 models/orders/dw/order_line_fact.sql          | 173 +++++++++++++++++-
 models/orders/dw/orders_dw.yml                |  71 ++++++-
 models/orders/dw/refund_fact.sql              |  23 +++
 models/orders/staging/orders_staging.yml      |  41 +++++
 models/orders/staging/stg_refund_tenders.sql  |  14 ++
 models/orders/staging/stg_refunds.sql         | 121 ++++++++++++
 tests/order_fact_net_revenue_definition.sql   |  11 ++
 tests/order_fact_refund_over_revenue_warn.sql |  14 ++
 ..._fact_refund_total_matches_refund_fact.sql |  20 ++
 .../order_line_fact_refund_sums_to_order.sql  |  20 ++
 tests/refund_fact_amount_positive.sql         |  11 ++
 ...tg_refunds_no_shopify_stripe_collision.sql |  22 +++
 17 files changed, 575 insertions(+), 13 deletions(-)
 create mode 100644 models/orders/base/base_refunds_internal_pos.sql
 create mode 100644 models/orders/base/base_refunds_shopify.sql
 create mode 100644 models/orders/base/base_refunds_stripe.sql
 create mode 100644 models/orders/dw/refund_fact.sql
 create mode 100644 models/orders/staging/stg_refund_tenders.sql
 create mode 100644 models/orders/staging/stg_refunds.sql
 create mode 100644 tests/order_fact_net_revenue_definition.sql
 create mode 100644 tests/order_fact_refund_over_revenue_warn.sql
 create mode 100644 tests/order_fact_refund_total_matches_refund_fact.sql
 create mode 100644 tests/order_line_fact_refund_sums_to_order.sql
 create mode 100644 tests/refund_fact_amount_positive.sql
 create mode 100644 tests/stg_refunds_no_shopify_stripe_collision.sql

diff --git a/models/orders/base/base_refunds_internal_pos.sql b/models/orders/base/base_refunds_internal_pos.sql
new file mode 100644
index 0000000..e890b9a
--- /dev/null
+++ b/models/orders/base/base_refunds_internal_pos.sql
@@ -0,0 +1,4 @@
+{{ config(materialized='view') }}
+
+SELECT *
+FROM {{ source('raw', 'refunds_internal_pos') }}
diff --git a/models/orders/base/base_refunds_shopify.sql b/models/orders/base/base_refunds_shopify.sql
new file mode 100644
index 0000000..8e819c1
--- /dev/null
+++ b/models/orders/base/base_refunds_shopify.sql
@@ -0,0 +1,4 @@
+{{ config(materialized='view') }}
+
+SELECT *
+FROM {{ source('raw', 'refunds_shopify') }}
diff --git a/models/orders/base/base_refunds_stripe.sql b/models/orders/base/base_refunds_stripe.sql
new file mode 100644
index 0000000..b4b2482
--- /dev/null
+++ b/models/orders/base/base_refunds_stripe.sql
@@ -0,0 +1,4 @@
+{{ config(materialized='view') }}
+
+SELECT *
+FROM {{ source('raw', 'refunds_stripe') }}
diff --git a/models/orders/base/orders_base.yml b/models/orders/base/orders_base.yml
index c03d25c..8169b89 100644
--- a/models/orders/base/orders_base.yml
+++ b/models/orders/base/orders_base.yml
@@ -15,3 +15,6 @@ sources:
       - name: shipment_line_items
       - name: merchants
       - name: products
+      - name: refunds_shopify
+      - name: refunds_stripe
+      - name: refunds_internal_pos
diff --git a/models/orders/dw/order_fact.sql b/models/orders/dw/order_fact.sql
index 6b7e574..0b5a711 100644
--- a/models/orders/dw/order_fact.sql
+++ b/models/orders/dw/order_fact.sql
@@ -22,6 +22,28 @@ WITH order_revenue AS (
     GROUP BY order_id
 )
 
+, order_refunds AS (
+    SELECT
+        order_id
+        , sum(refund_amount) AS refund_total
+        , count(DISTINCT source_refund_id) AS refund_count
+        , max(refunded_at) AS last_refunded_at
+    FROM {{ ref('refund_fact') }}
+    GROUP BY order_id
+)
+
+-- Orders with refund activity since the last incremental run.
+-- Watermark column is `last_refunded_at` on this model (order_fact) — any refund
+-- with refunded_at strictly greater than the previous max means the order needs
+-- a refresh of its refund aggregates. See docs/designs/2026-Q2-refunds-modeling.md §4.
+, orders_with_new_refunds AS (
+    SELECT DISTINCT order_id
+    FROM {{ ref('refund_fact') }}
+    {% if is_incremental() %}
+        WHERE refunded_at > {{ get_incremental_value('last_refunded_at', relation=this) }}
+    {% endif %}
+)
+
 SELECT
     o.order_id
     , o.merchant_id
@@ -37,6 +59,11 @@ SELECT
     , orev.line_count
     , orev.quantity_ordered AS total_quantity
     , orev.revenue
+    , coalesce(orf.refund_total, 0) AS refund_total
+    , coalesce(orf.refund_count, 0) AS refund_count
+    , orf.last_refunded_at
+    , orev.revenue - coalesce(orf.refund_total, 0) AS net_revenue
+    , coalesce(orf.refund_total, 0) >= orev.revenue AND orev.revenue > 0 AS is_fully_refunded
     , current_timestamp AS created_at_dwh
     , current_timestamp AS updated_at_dwh
 FROM {{ ref('stg_orders') }} AS o
@@ -44,8 +71,13 @@ LEFT JOIN order_revenue AS orev
     ON o.order_id = orev.order_id
 LEFT JOIN order_shipments AS os
     ON o.order_id = os.order_id
+LEFT JOIN order_refunds AS orf
+    ON o.order_id = orf.order_id
 LEFT JOIN {{ ref('lkp_merchants') }} AS m
     ON o.merchant_id = m.merchant_id
 {% if is_incremental() %}
+    LEFT JOIN orders_with_new_refunds AS nrf
+        ON o.order_id = nrf.order_id
     WHERE o.ordered_at >= {{ get_incremental_value('ordered_at') }}
+        OR nrf.order_id IS NOT NULL
 {% endif %}
diff --git a/models/orders/dw/order_line_fact.sql b/models/orders/dw/order_line_fact.sql
index 18b7858..b14772d 100644
--- a/models/orders/dw/order_line_fact.sql
+++ b/models/orders/dw/order_line_fact.sql
@@ -3,17 +3,172 @@
     unique_key='line_item_id'
 ) }}
 
+-- One row per order line.
+-- Refund attribution (see docs/designs/2026-Q2-refunds-modeling.md §3.4, §3.6):
+--   * 'direct'   — Shopify-attributed refund, exact line_item_id match.
+--   * 'pro_rata' — POS / Stripe-direct order-grain refund, allocated by line_revenue share.
+--   * 'mixed'    — line has both direct and allocated components (rare).
+--   * 'none'     — order has no refunds.
+-- Rounding-drift trick: pro-rata amounts are rounded to cents and the residual
+-- is pushed to the largest line per order so sum(line.refund_amount) ties to
+-- order_fact.refund_total to the penny.
+
+WITH refunds_direct AS (
+    SELECT
+        line_item_id
+        , sum(refund_amount) AS direct_refund_amount
+        , sum(qty_refunded) AS qty_refunded
+    FROM {{ ref('refund_fact') }}
+    WHERE line_item_id IS NOT NULL
+    GROUP BY line_item_id
+)
+
+, refunds_order_grain AS (
+    SELECT
+        order_id
+        , sum(refund_amount) AS order_grain_refund_total
+    FROM {{ ref('refund_fact') }}
+    WHERE line_item_id IS NULL
+    GROUP BY order_id
+)
+
+, order_revenue AS (
+    SELECT
+        order_id
+        , sum(quantity * unit_price) AS order_revenue_total
+    FROM {{ ref('stg_line_items') }}
+    GROUP BY order_id
+)
+
+, order_last_refunded AS (
+    SELECT
+        order_id
+        , max(refunded_at) AS last_refunded_at
+    FROM {{ ref('refund_fact') }}
+    GROUP BY order_id
+)
+
+, line_with_allocation AS (
+    SELECT
+        li.line_item_id
+        , li.order_id
+        , li.product_id
+        , li.quantity
+        , li.unit_price
+        , li.quantity * li.unit_price AS line_revenue
+        , orv.order_revenue_total
+        , rd.direct_refund_amount
+        , rd.qty_refunded
+        , rog.order_grain_refund_total
+        , olr.last_refunded_at
+        , CASE
+            WHEN rog.order_grain_refund_total IS NOT NULL AND orv.order_revenue_total > 0
+                THEN round(
+                        (li.quantity * li.unit_price) / orv.order_revenue_total * rog.order_grain_refund_total
+                        , 2
+                    )
+            ELSE 0
+        END AS allocated_refund_raw
+        , row_number() OVER (
+            PARTITION BY li.order_id
+            ORDER BY li.quantity * li.unit_price DESC, li.line_item_id
+        ) AS line_rank
+    FROM {{ ref('stg_line_items') }} AS li
+    LEFT JOIN order_revenue AS orv
+        ON li.order_id = orv.order_id
+    LEFT JOIN refunds_direct AS rd
+        ON li.line_item_id = rd.line_item_id
+    LEFT JOIN refunds_order_grain AS rog
+        ON li.order_id = rog.order_id
+    LEFT JOIN order_last_refunded AS olr
+        ON li.order_id = olr.order_id
+)
+
+, allocation_drift AS (
+    SELECT
+        order_id
+        , max(order_grain_refund_total) - sum(allocated_refund_raw) AS drift
+    FROM line_with_allocation
+    WHERE order_grain_refund_total IS NOT NULL
+    GROUP BY order_id
+)
+
+, lines_with_final_refund AS (
+    SELECT
+        lwa.line_item_id
+        , lwa.order_id
+        , lwa.product_id
+        , lwa.quantity
+        , lwa.unit_price
+        , lwa.line_revenue
+        , lwa.qty_refunded
+        , lwa.last_refunded_at
+        , coalesce(lwa.direct_refund_amount, 0)
+        + lwa.allocated_refund_raw
+        + CASE
+            WHEN lwa.line_rank = 1 AND lwa.order_grain_refund_total IS NOT NULL
+                THEN coalesce(ad.drift, 0)
+            ELSE 0
+        END AS refund_amount_raw
+        , lwa.direct_refund_amount
+        , lwa.order_grain_refund_total
+    FROM line_with_allocation AS lwa
+    LEFT JOIN allocation_drift AS ad
+        ON lwa.order_id = ad.order_id
+)
+
+{% if is_incremental() %}
+    , orders_with_new_refunds AS (
+        SELECT DISTINCT order_id
+        FROM {{ ref('refund_fact') }}
+        WHERE refunded_at > {{ get_incremental_value('last_refunded_at', relation=this) }}
+    )
+
+    , existing_lines AS (
+        SELECT line_item_id
+        FROM {{ this }}
+    )
+{% endif %}
+
 SELECT
-    li.line_item_id
-    , li.order_id
-    , li.product_id
-    , li.quantity
-    , li.unit_price
-    , li.quantity * li.unit_price AS line_revenue
+    lwf.line_item_id
+    , lwf.order_id
+    , lwf.product_id
+    , lwf.quantity
+    , lwf.unit_price
+    , lwf.line_revenue
+    , lwf.qty_refunded
+    , CASE
+        WHEN lwf.direct_refund_amount IS NULL AND lwf.order_grain_refund_total IS NULL
+            THEN cast(NULL AS double)
+        ELSE lwf.refund_amount_raw
+    END AS refund_amount
+    , CASE
+        WHEN lwf.direct_refund_amount IS NOT NULL AND lwf.order_grain_refund_total IS NOT NULL
+            THEN 'mixed'
+        WHEN lwf.direct_refund_amount IS NOT NULL
+            THEN 'direct'
+        WHEN lwf.order_grain_refund_total IS NOT NULL
+            THEN 'pro_rata'
+        ELSE 'none'
+    END AS refund_allocation_method
+    , lwf.line_revenue - coalesce(
+        CASE
+            WHEN lwf.direct_refund_amount IS NULL AND lwf.order_grain_refund_total IS NULL
+                THEN 0
+            ELSE lwf.refund_amount_raw
+        END
+        , 0
+    ) AS net_line_revenue
+    , lwf.last_refunded_at
     , current_timestamp AS created_at_dwh
     , current_timestamp AS updated_at_dwh
-FROM {{ ref('stg_line_items') }} AS li
-
+FROM lines_with_final_refund AS lwf
 {% if is_incremental() %}
-    WHERE li.line_item_id NOT IN (SELECT t.line_item_id FROM {{ this }} AS t)
+    LEFT JOIN existing_lines AS el
+        ON lwf.line_item_id = el.line_item_id
+    LEFT JOIN orders_with_new_refunds AS nrf
+        ON lwf.order_id = nrf.order_id
+    WHERE el.line_item_id IS NULL
+        OR nrf.order_id IS NOT NULL
 {% endif %}
diff --git a/models/orders/dw/orders_dw.yml b/models/orders/dw/orders_dw.yml
index 72d5811..69e1fa6 100644
--- a/models/orders/dw/orders_dw.yml
+++ b/models/orders/dw/orders_dw.yml
@@ -2,7 +2,7 @@ version: 2
 
 models:
   - name: order_fact
-    description: One row per order with revenue and shipment metadata.
+    description: One row per order with revenue, refund aggregates, and shipment metadata.
     columns:
       - name: order_id
         description: Primary key.
@@ -25,13 +25,25 @@ models:
       - name: total_quantity
       - name: revenue
         description: '{{ doc("order_fact_revenue") }}'
+      - name: refund_total
+        description: Sum of refund_fact.refund_amount for this order, in dollars. Zero when no refunds.
+        tests:
+          - not_null
+      - name: refund_count
+        description: Distinct count of source_refund_id (logical refund events) for this order.
+        tests:
+          - not_null
+      - name: last_refunded_at
+        description: Max refunded_at across refunds for this order. NULL when no refunds.
+      - name: net_revenue
+        description: revenue minus refund_total. Can be negative on goodwill / overpayments.
+      - name: is_fully_refunded
+        description: TRUE when refund_total >= revenue and revenue > 0.
       - name: created_at_dwh
       - name: updated_at_dwh
 
   - name: order_line_fact
-    description: |
-      Skeleton — one row per order line. Will be extended with refund allocations
-      in DATA-456 (Problem 2 / refunds work).
+    description: One row per order line. Carries line-level refund attribution (direct or pro-rata allocated).
     columns:
       - name: line_item_id
         tests:
@@ -42,3 +54,54 @@ models:
       - name: quantity
       - name: unit_price
       - name: line_revenue
+      - name: qty_refunded
+        description: Quantity refunded (Shopify line-grain only; NULL for pro-rata allocated lines).
+      - name: refund_amount
+        description: Dollar refund attributed to this line — direct from Shopify or pro-rata allocated.
+      - name: refund_allocation_method
+        description: One of `direct`, `pro_rata`, `mixed`, `none`. See refunds design doc §3.4.
+        tests:
+          - not_null
+          - accepted_values:
+              values: ['direct', 'pro_rata', 'mixed', 'none']
+      - name: net_line_revenue
+        description: line_revenue minus refund_amount.
+      - name: last_refunded_at
+        description: Max refunded_at on the line's order (denormalized for incremental watermark).
+      - name: created_at_dwh
+      - name: updated_at_dwh
+
+  - name: refund_fact
+    description: One row per refund event at order × line × source grain. See docs/designs/2026-Q2-refunds-modeling.md.
+    columns:
+      - name: refund_event_id
+        description: Surrogate PK — md5(source, source_refund_id, coalesce(line_item_id, 'ORDER')).
+        tests:
+          - unique
+          - not_null
+      - name: source
+        description: One of `shopify`, `stripe`, `internal_pos`.
+        tests:
+          - not_null
+          - accepted_values:
+              values: ['shopify', 'stripe', 'internal_pos']
+      - name: source_refund_id
+        description: The source system's own refund_id (kept for traceability).
+        tests:
+          - not_null
+      - name: order_id
+        tests:
+          - not_null
+      - name: line_item_id
+        description: NULL for POS and Stripe-direct refunds (order-grain only).
+      - name: qty_refunded
+        description: Shopify-only; NULL otherwise.
+      - name: refund_amount
+        description: Dollar amount of the refund event.
+        tests:
+          - not_null
+      - name: refunded_at
+        tests:
+          - not_null
+      - name: created_at_dwh
+      - name: updated_at_dwh
diff --git a/models/orders/dw/refund_fact.sql b/models/orders/dw/refund_fact.sql
new file mode 100644
index 0000000..6c2cd60
--- /dev/null
+++ b/models/orders/dw/refund_fact.sql
@@ -0,0 +1,23 @@
+{{ config(
+    materialized='incremental',
+    unique_key='refund_event_id'
+) }}
+
+-- One row per refund event at order × line × source grain.
+-- See docs/designs/2026-Q2-refunds-modeling.md for modeling decisions.
+
+SELECT
+    r.refund_event_id
+    , r.source
+    , r.source_refund_id
+    , r.order_id
+    , r.line_item_id
+    , r.qty_refunded
+    , r.refund_amount
+    , r.refunded_at
+    , current_timestamp AS created_at_dwh
+    , current_timestamp AS updated_at_dwh
+FROM {{ ref('stg_refunds') }} AS r
+{% if is_incremental() %}
+    WHERE r.refunded_at >= {{ get_incremental_value('refunded_at') }}
+{% endif %}
diff --git a/models/orders/staging/orders_staging.yml b/models/orders/staging/orders_staging.yml
index 2ac43f6..feac85b 100644
--- a/models/orders/staging/orders_staging.yml
+++ b/models/orders/staging/orders_staging.yml
@@ -49,3 +49,44 @@ models:
 
   # stg_shipment_line_items — bare (undocumented)
   - name: stg_shipment_line_items
+
+  - name: stg_refunds
+    description: |
+      Unified, deduped refund staging. See docs/designs/2026-Q2-refunds-modeling.md §3.3
+      for the Shopify ↔ Stripe dedup heuristic.
+    columns:
+      - name: refund_event_id
+        description: Surrogate PK derived from (source, source_refund_id, line_item_id).
+        tests:
+          - unique
+          - not_null
+      - name: source
+        tests:
+          - not_null
+          - accepted_values:
+              values: ['shopify', 'stripe', 'internal_pos']
+      - name: order_id
+        tests:
+          - not_null
+      - name: refund_amount
+        tests:
+          - not_null
+      - name: refunded_at
+        tests:
+          - not_null
+
+  - name: stg_refund_tenders
+    description: Stripe tender-split sidecar. Joins to stg_refunds on (order_id, refunded_at_minute).
+    columns:
+      - name: source_refund_id
+        tests:
+          - not_null
+      - name: order_id
+        tests:
+          - not_null
+      - name: tender_type
+        tests:
+          - not_null
+      - name: tender_amount
+        tests:
+          - not_null
diff --git a/models/orders/staging/stg_refund_tenders.sql b/models/orders/staging/stg_refund_tenders.sql
new file mode 100644
index 0000000..1730ffb
--- /dev/null
+++ b/models/orders/staging/stg_refund_tenders.sql
@@ -0,0 +1,14 @@
+{{ config(materialized='view') }}
+
+-- Stripe tender-split sidecar. One row per (refund event, tender_type).
+-- Joins back to stg_refunds on (order_id, refunded_at_minute) — see §3.3 of the
+-- refunds design doc for why this is the chosen join key.
+
+SELECT
+    refund_id AS source_refund_id
+    , order_id
+    , tender_type
+    , amount_in_cents / 100.0 AS tender_amount
+    , CAST(processed_at AS timestamp) AS refunded_at
+    , DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
+FROM {{ ref('base_refunds_stripe') }}
diff --git a/models/orders/staging/stg_refunds.sql b/models/orders/staging/stg_refunds.sql
new file mode 100644
index 0000000..5a47772
--- /dev/null
+++ b/models/orders/staging/stg_refunds.sql
@@ -0,0 +1,121 @@
+{{ config(materialized='view') }}
+
+-- Unified refund staging across the three source systems.
+--
+-- Dedup precedence (see docs/designs/2026-Q2-refunds-modeling.md §3.3):
+--   1. Shopify wins on (order_id, refunded_at_minute) — line-grain, system of record for amount.
+--   2. Internal POS is additive (standalone register, no e-commerce pairing).
+--   3. Stripe rows are kept only when no Shopify pairing exists at the same (order_id, minute).
+--      Stripe tender breakdown moves to stg_refund_tenders.
+--
+-- Heuristic limitation: the dedup join uses minute-truncated timestamps because
+-- the source systems carry no shared refund key. If/when upstream provides a
+-- gateway_refund_id cross-walk, swap that in here.
+
+WITH shopify AS (
+    SELECT
+        refund_id AS source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_shopify') }}
+)
+
+, stripe_raw AS (
+    SELECT
+        refund_id AS source_refund_id
+        , order_id
+        , tender_type
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(processed_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_stripe') }}
+)
+
+-- Collapse Stripe tender splits to one row per refund event for dedup against Shopify.
+, stripe_events AS (
+    SELECT
+        order_id
+        , refunded_at_minute
+        , MIN(refunded_at) AS refunded_at
+        , MIN(source_refund_id) AS source_refund_id
+        , SUM(refund_amount) AS refund_amount
+    FROM stripe_raw
+    GROUP BY order_id, refunded_at_minute
+)
+
+, shopify_keys AS (
+    SELECT DISTINCT
+        order_id
+        , refunded_at_minute
+    FROM shopify
+)
+
+, pos AS (
+    SELECT
+        refund_id AS source_refund_id
+        , order_id
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_internal_pos') }}
+)
+
+, unified AS (
+    SELECT
+        'shopify' AS source
+        , source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
+    FROM shopify
+
+    UNION ALL
+
+    SELECT
+        'internal_pos' AS source
+        , source_refund_id
+        , order_id
+        , CAST(NULL AS varchar) AS line_item_id
+        , CAST(NULL AS bigint) AS qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
+    FROM pos
+
+    UNION ALL
+
+    -- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
+    SELECT
+        'stripe' AS source
+        , se.source_refund_id
+        , se.order_id
+        , CAST(NULL AS varchar) AS line_item_id
+        , CAST(NULL AS bigint) AS qty_refunded
+        , se.refund_amount
+        , se.refunded_at
+        , se.refunded_at_minute
+    FROM stripe_events AS se
+    LEFT JOIN shopify_keys AS sk
+        ON se.order_id = sk.order_id
+            AND se.refunded_at_minute = sk.refunded_at_minute
+    WHERE sk.order_id IS NULL
+)
+
+SELECT
+    MD5(source || '-' || source_refund_id || '-' || COALESCE(line_item_id, 'ORDER')) AS refund_event_id
+    , source
+    , source_refund_id
+    , order_id
+    , line_item_id
+    , qty_refunded
+    , refund_amount
+    , refunded_at
+    , refunded_at_minute
+FROM unified
diff --git a/tests/order_fact_net_revenue_definition.sql b/tests/order_fact_net_revenue_definition.sql
new file mode 100644
index 0000000..aa70909
--- /dev/null
+++ b/tests/order_fact_net_revenue_definition.sql
@@ -0,0 +1,11 @@
+-- Tautology guard: net_revenue must equal revenue - refund_total at row level.
+-- Catches future drift if someone changes one column's formula and not the other.
+
+SELECT
+    order_id
+    , revenue
+    , refund_total
+    , net_revenue
+    , revenue - refund_total AS expected_net_revenue
+FROM {{ ref('order_fact') }}
+WHERE abs(net_revenue - (revenue - refund_total)) > 0.01
diff --git a/tests/order_fact_refund_over_revenue_warn.sql b/tests/order_fact_refund_over_revenue_warn.sql
new file mode 100644
index 0000000..d9eef5f
--- /dev/null
+++ b/tests/order_fact_refund_over_revenue_warn.sql
@@ -0,0 +1,14 @@
+{{ config(severity='warn') }}
+
+-- Finance signal (not a bug): orders where refund_total exceeds revenue
+-- (goodwill, over-refund, or data quality issue at the source). Surfaces as a
+-- dbt test warning so Finance is notified but the build doesn't fail.
+
+SELECT
+    order_id
+    , revenue
+    , refund_total
+    , refund_total - revenue AS over_refund
+FROM {{ ref('order_fact') }}
+WHERE refund_total > revenue + 0.01
+    AND coalesce(lower(cast(is_test AS varchar)), 'false') != 'true'
diff --git a/tests/order_fact_refund_total_matches_refund_fact.sql b/tests/order_fact_refund_total_matches_refund_fact.sql
new file mode 100644
index 0000000..0cfb727
--- /dev/null
+++ b/tests/order_fact_refund_total_matches_refund_fact.sql
@@ -0,0 +1,20 @@
+-- Per-order: order_fact.refund_total must equal sum(refund_fact.refund_amount).
+-- Row-level invariant (per CONTRIBUTING.md: row-level > aggregate reconciliation).
+
+WITH fact_refunds AS (
+    SELECT
+        order_id
+        , sum(refund_amount) AS expected_refund_total
+    FROM {{ ref('refund_fact') }}
+    GROUP BY order_id
+)
+
+SELECT
+    f.order_id
+    , f.refund_total
+    , coalesce(fr.expected_refund_total, 0) AS expected_refund_total
+    , f.refund_total - coalesce(fr.expected_refund_total, 0) AS diff
+FROM {{ ref('order_fact') }} AS f
+LEFT JOIN fact_refunds AS fr
+    ON f.order_id = fr.order_id
+WHERE abs(f.refund_total - coalesce(fr.expected_refund_total, 0)) > 0.01
diff --git a/tests/order_line_fact_refund_sums_to_order.sql b/tests/order_line_fact_refund_sums_to_order.sql
new file mode 100644
index 0000000..911f7dc
--- /dev/null
+++ b/tests/order_line_fact_refund_sums_to_order.sql
@@ -0,0 +1,20 @@
+-- Per-order: sum(order_line_fact.refund_amount) must equal order_fact.refund_total
+-- within ±$0.01 to allow for pro-rata rounding drift handled in order_line_fact.
+
+WITH line_totals AS (
+    SELECT
+        order_id
+        , sum(coalesce(refund_amount, 0)) AS line_refund_sum
+    FROM {{ ref('order_line_fact') }}
+    GROUP BY order_id
+)
+
+SELECT
+    f.order_id
+    , f.refund_total
+    , coalesce(lt.line_refund_sum, 0) AS line_refund_sum
+    , f.refund_total - coalesce(lt.line_refund_sum, 0) AS diff
+FROM {{ ref('order_fact') }} AS f
+LEFT JOIN line_totals AS lt
+    ON f.order_id = lt.order_id
+WHERE abs(f.refund_total - coalesce(lt.line_refund_sum, 0)) > 0.01
diff --git a/tests/refund_fact_amount_positive.sql b/tests/refund_fact_amount_positive.sql
new file mode 100644
index 0000000..a0d7460
--- /dev/null
+++ b/tests/refund_fact_amount_positive.sql
@@ -0,0 +1,11 @@
+-- All refund amounts must be strictly positive. A non-positive refund_amount
+-- indicates either a sign-flip bug at ingest or a chargeback masquerading as a
+-- refund (chargebacks are explicitly out of scope per §10 of the design doc).
+
+SELECT
+    refund_event_id
+    , source
+    , source_refund_id
+    , refund_amount
+FROM {{ ref('refund_fact') }}
+WHERE refund_amount <= 0
diff --git a/tests/stg_refunds_no_shopify_stripe_collision.sql b/tests/stg_refunds_no_shopify_stripe_collision.sql
new file mode 100644
index 0000000..9ba5d5b
--- /dev/null
+++ b/tests/stg_refunds_no_shopify_stripe_collision.sql
@@ -0,0 +1,22 @@
+-- Dedup invariant for §3.3: no (order_id, refunded_at_minute) appears in
+-- stg_refunds from BOTH the shopify and stripe sources. If this fires, the
+-- minute-truncation heuristic missed a collision and the same logical refund
+-- is being double-counted.
+
+WITH per_source AS (
+    SELECT
+        order_id
+        , refunded_at_minute
+        , source
+    FROM {{ ref('stg_refunds') }}
+    WHERE source IN ('shopify', 'stripe')
+    GROUP BY order_id, refunded_at_minute, source
+)
+
+SELECT
+    order_id
+    , refunded_at_minute
+    , count(DISTINCT source) AS source_count
+FROM per_source
+GROUP BY order_id, refunded_at_minute
+HAVING count(DISTINCT source) > 1