Skip to content

[DATA-123] Fix order_fact revenue + [DATA-145] add refunds modeling#8

Open
tiwaryanuj wants to merge 2 commits into
helloextend:masterfrom
tiwaryanuj:atiwary.DATA-123.fix-order-fact-revenue
Open

[DATA-123] Fix order_fact revenue + [DATA-145] add refunds modeling#8
tiwaryanuj wants to merge 2 commits into
helloextend:masterfrom
tiwaryanuj:atiwary.DATA-123.fix-order-fact-revenue

Conversation

@tiwaryanuj
Copy link
Copy Markdown

@tiwaryanuj tiwaryanuj commented May 20, 2026

Summary

  • Fix DATA-123order_fact.revenue was understating non-test Q1 revenue by $2,467,627.97 vs the Stripe captures number Finance gave us ($10,522,258.04 reported vs $12,989,886.01 truth). Two independent root causes; both fixed.
  • Implement DATA-145 — wire the three refund sources into the warehouse and surface net_revenue / refund_total on order_fact per the design doc added in this branch (docs/designs/2026-Q2-refunds-modeling.md). Pending Finance sign-off on §5 of that doc but the model is in place and tested.

Two commits, each scoped to one ticket. Easier to revert independently if needed.

DATA-123 root cause

metric non-test revenue Δ vs Stripe
Stripe captures (per ticket) $12,989,886.01
sum(stg_line_items.quantity × unit_price) (canonical) $12,989,886.01 $0.00
Before fix — order_fact.revenue $10,522,258.04 −$2,467,627.97
After fix — order_fact.revenue $12,989,886.01 $0.00

The $2.47M gap split into two independent bugs:

  1. QUALIFY row_number() OVER (PARTITION BY order_id ORDER BY shipped_at) = 1 kept only the first shipment per order and silently dropped revenue from later shipments. 847 of 8,548 non-test orders are multi-shipment. Cost: −$747,890.76.
  2. revenue was computed from quantity_shipped × unit_price at shipment-line grain. Partially-shipped, partially-cancelled, and not-yet-shipped orders lost the un-shipped portion. Cost: −$1,719,737.21 (this bug would have remained even with QUALIFY fixed).

Fix: aggregate stg_line_items to order grain first (sum(quantity × unit_price)) — no shipment fan-out, no QUALIFY needed. Shipment metadata attached as a sidecar (shipment_count, first_shipped_at). Also fixes the incremental watermark, which was filtering ordered_at >= max(updated_at_dwh) — column mismatch. Now filters and watermarks both on ordered_at.

DATA-145 (refunds) — what landed

Implements the proposal in docs/designs/2026-Q2-refunds-modeling.md. Key decisions captured there, not repeated here. High level:

  • New refund_fact (grain: order × line × refund_event), incremental on refunded_at.
  • order_fact gains refund_total, refund_count, last_refunded_at, net_revenue, is_fully_refunded.
  • order_line_fact gains refund_amount, qty_refunded, refund_allocation_method (direct / pro_rata).
  • Shopify-vs-Stripe dedup via (order_id, refunded_at_minute) heuristic (§3.3 — flagged as needing Finance confirmation if a gateway cross-walk exists upstream).
  • Late-arriving refund handling on order_fact: incremental filter is union of ordered_at >= watermark OR order_id touched by refunds since the previous run (§4 — the production gotcha).

Validation

make full   → 23/23 OK
make test   → 41/41 PASS
  • sum(order_fact.revenue) where not is_test = $12,989,886.01 (ties to Stripe captures to the penny).
  • count(*) == count(distinct order_id) == 9699 — one row per order.
  • New refunds tests pass: row-level invariant sum(refund_fact.amount) per order == order_fact.refund_total; allocated line-level sums tie to order_fact.refund_total within $0.01 (pro-rata rounding); no (order_id, minute) collision across Shopify+Stripe in stg_refunds.

Test plan

  • Review models/orders/dw/order_fact.sql — confirm the new grain logic + the incremental filter union (ordered_at OR refund-touched).
  • Review docs/designs/2026-Q2-refunds-modeling.md §3.3 (Shopify-Stripe dedup) and §5 (open questions for Finance) before merging the DATA-145 portion.
  • Spot-check 3–5 multi-shipment orders from the top of the impacted list (O001276, O007967, O002332) against the new order_fact.revenue.
  • Diff daily_revenue pre- vs post-fix; the per-day step-up should be material — heads-up to Jamie before deploy.
  • After merge: flip tests/order_fact_revenue_reconciliation.sql from severity='warn' to error. The warn was masking known-broken state per CONTRIBUTING.md.

Out of scope (intentionally)

  • Type 2 history on lkp_merchants — same constraint as the Q3-2024 design doc.
  • Chargebacks — different upstream, different finance treatment (see refunds doc §10).
  • Tax / shipping refunds breakout — assumed gross-of-tax, matches how revenue is computed today.

Open items / follow-ups

  • Finance sign-off on refunds design doc §5 (Q's A–G), especially the Shopify-Stripe dedup heuristic in §3.3 and the goodwill-overrefund treatment in §5-B.
  • Add not_null schema test on order_fact.revenue (every order has line items, so this should always hold).
  • The daily_revenue reporting view will produce materially different numbers post-deploy — coordinate with Finance on the cutover date and expected Q1 delta.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Refund data now modeled and tracked across multiple payment sources (Shopify, Stripe, internal POS)
    • Orders and line items now include refund metrics and net revenue calculations
    • Unified refund consolidation with deduplication across payment platforms
  • Documentation

    • Added repository guidance documenting architecture, modeling conventions, and contribution standards
    • Added design documentation detailing refund modeling approach and implementation plan

Review Change Stack

atiwary added 2 commits May 20, 2026 16:48
Root cause: order_fact had two independent bugs that combined to
understate non-test Q1 revenue by $2,467,627.97 vs Stripe captures
($10,522,258.04 reported vs $12,989,886.01 truth).

1. The QUALIFY row_number() PARTITION BY order_id ORDER BY shipped_at = 1
   kept only the first shipment per order and silently dropped revenue
   from 2nd+ shipments on 847 multi-shipment orders (-$747,890.76).
2. Revenue was sum(quantity_shipped * unit_price) at shipment-line
   grain. Partially-shipped, partially-cancelled, and not-yet-shipped
   orders lost the un-shipped portion (-$1,719,737.21).

Fix: aggregate stg_line_items at order grain first (revenue =
sum(quantity * unit_price), no shipment fan-out, no QUALIFY needed),
attach shipment metadata as a sidecar (shipment_count, first_shipped_at).
Also fixes the incremental watermark column mismatch (filter and
watermark now both on ordered_at instead of ordered_at vs updated_at_dwh).

Validated: make full + make test both clean (17/17 + 12/12). Non-test
sum(revenue) ties to $12,989,886.01 to the penny; rows == distinct
order_ids == 9699 confirms one-row-per-order grain.

Also adds:
- CLAUDE.md documenting repo layering, incremental pattern, and the
  prior bug zones so future work doesn't re-discover them.
- docs/designs/2026-Q2-refunds-modeling.md: draft design for the
  refunds/net-revenue follow-on (DATA-145 territory), out-of-scope
  for this fix per the Q3-2024 design doc.
Implements the design proposal in docs/designs/2026-Q2-refunds-modeling.md
that was added in the previous DATA-123 commit. Wires the three refund
sources (Stripe, Shopify, Internal POS) into the warehouse, builds a
refund_fact, and surfaces refund_total / net_revenue / is_fully_refunded
on order_fact plus refund_amount + qty_refunded on order_line_fact.

Key decisions (see design doc for full rationale):
- refund_fact grain: order x line x refund_event.
- Shopify-Stripe dedup via (order_id, refunded_at_minute) heuristic;
  Shopify wins on occurrence + line + amount, Stripe is tender sidecar.
- Pro-rata allocation of order-grain refunds (POS, Stripe-direct) to
  lines, with explicit refund_allocation_method flag.
- Late-arriving refund handling: order_fact incremental filter unions
  ordered_at >= watermark with orders touched by refunds since the
  last refund-side watermark.

Tests added (row-level invariants per CONTRIBUTING.md):
- refund_fact: unique/not_null on refund_event_id, amount > 0.
- order_fact: per-order refund_total ties to sum(refund_fact.amount);
  net_revenue tautology guard; warn-only signal when refund > revenue.
- order_line_fact: per-order sum(refund_amount) ties to order_fact
  refund_total within rounding.
- stg_refunds: no (order_id, minute) collision across Shopify+Stripe.

Validated: make full clean (23/23); make test clean (41/41).

Open items pending Finance sign-off per design doc Sec.5 (e.g., gateway
cross-walk vs heuristic, tax/shipping treatment). Tracked there, not
blocking this commit.
@tiwaryanuj tiwaryanuj requested a review from phil510 as a code owner May 20, 2026 22:25
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Walkthrough

This PR implements end-to-end refund modeling for net revenue computation. It adds three base models extracting refunds from Shopify, Stripe, and internal POS sources; a unified staging layer that deduplicates refunds across sources using Shopify precedence and minute-granularity timestamps; a refund_fact incremental table emitting one refund event per order × line × source; refund aggregates to order_fact with dual incremental logic on order date and refund recency; per-line refund attribution in order_line_fact with pro-rata allocation and rounding reconciliation; and nine tests validating accounting invariants and dedup logic. The design document specifies Finance requirements, dedup heuristics, allocation methods, and test expectations.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 SQLFluff (4.2.1)
models/orders/dw/order_fact.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

models/orders/base/base_refunds_internal_pos.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

models/orders/base/base_refunds_shopify.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

  • 11 others

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (3)
models/orders/dw/order_fact.sql (1)

58-61: 💤 Low value

Inconsistent NULL handling for orders with no line items.

shipment_count uses COALESCE (line 58), but line_count, total_quantity, and revenue do not. If an order exists in stg_orders without matching line items, these columns will be NULL while shipment_count defaults to 0.

Consider applying consistent COALESCE handling:

Proposed fix
     , coalesce(os.shipment_count, 0) AS shipment_count
-    , orev.line_count
-    , orev.quantity_ordered AS total_quantity
-    , orev.revenue
+    , coalesce(orev.line_count, 0) AS line_count
+    , coalesce(orev.quantity_ordered, 0) AS total_quantity
+    , coalesce(orev.revenue, 0) AS revenue
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/order_fact.sql` around lines 58 - 61, The SELECT mixes
COALESCE for os.shipment_count but leaves orev.line_count, orev.quantity_ordered
(total_quantity), and orev.revenue nullable; make NULL handling consistent by
wrapping orev.line_count, orev.quantity_ordered and orev.revenue with COALESCE
(e.g., COALESCE(orev.line_count, 0), COALESCE(orev.quantity_ordered, 0),
COALESCE(orev.revenue, 0.0)) so orders without line items return zeros like
os.shipment_count; update the expressions that reference orev.line_count,
orev.quantity_ordered, and orev.revenue accordingly.
models/orders/dw/orders_dw.yml (1)

26-27: ⚡ Quick win

Missing not_null test on revenue (open item from PR objectives).

The PR objectives note this as an open item: "add not_null test on order_fact.revenue". If revenue can legitimately be NULL (orders with no line items), document the behavior; otherwise, add the test before merge or create a tracking issue.

Proposed fix if revenue should never be NULL
       - name: revenue
         description: '{{ doc("order_fact_revenue") }}'
+        tests:
+          - not_null
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/orders_dw.yml` around lines 26 - 27, The review notes that
the orders_dw model's revenue column (symbol: revenue, doc: order_fact_revenue)
lacks a not_null test; either add a not_null test under the revenue column in
models/orders/dw/orders_dw.yml (columns: revenue -> tests: - not_null) so dbt
enforces non-null revenue, or if revenue can be legitimately NULL, update the
revenue description (order_fact_revenue) to document that nulls are expected and
create a tracking issue referencing this decision instead of merging without a
test.
tests/stg_refunds_no_shopify_stripe_collision.sql (1)

6-22: ⚡ Quick win

Consider simplifying by removing the redundant CTE.

The per_source CTE groups by all selected columns without aggregation, making it equivalent to SELECT DISTINCT. Since the outer query still needs to aggregate by (order_id, refunded_at_minute), the CTE doesn't reduce data volume or isolate complex logic.

♻️ Simpler direct query
-WITH per_source AS (
-    SELECT
-        order_id
-        , refunded_at_minute
-        , source
-    FROM {{ ref('stg_refunds') }}
-    WHERE source IN ('shopify', 'stripe')
-    GROUP BY order_id, refunded_at_minute, source
-)
-
 SELECT
     order_id
     , refunded_at_minute
     , count(DISTINCT source) AS source_count
-FROM per_source
+FROM {{ ref('stg_refunds') }}
+WHERE source IN ('shopify', 'stripe')
 GROUP BY order_id, refunded_at_minute
 HAVING count(DISTINCT source) > 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/stg_refunds_no_shopify_stripe_collision.sql` around lines 6 - 22, The
per_source CTE is redundant because it simply SELECTs and GROUPs the same
columns; remove the CTE and query directly from {{ ref('stg_refunds') }}
filtering source IN ('shopify','stripe') — i.e. in the main query use FROM {{
ref('stg_refunds') }} WHERE source IN (...) and then GROUP BY order_id,
refunded_at_minute and HAVING count(DISTINCT source) > 1; keep references to
order_id and refunded_at_minute and the count(DISTINCT source) aggregation as in
the current outer query.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 66-67: The docs currently state "Refunds are intentionally not
modeled" but the PR adds comprehensive refund modeling; update CLAUDE.md to
remove or replace that bullet and explicitly document the new refund design:
state that refunds are modeled via refund_fact, that order-level refund
aggregates are stored on order_fact, and that order_line_fact contains refund
attribution; also note which raw/refunds_*.csv inputs are consumed and any
relevant transformation or join keys used for refund_fact so future developers
aren't misled.

In `@docs/designs/2026-Q2-refunds-modeling.md`:
- Around line 1-14: The draft design "Refunds modeling — net revenue for
`order_fact` / `order_line_fact`" is marked Draft and requests Finance sign-off
in §5, but this PR already contains the implementation (base models, staging,
facts, tests) which risks rework if Finance changes the dedup heuristic (§3.3),
allocation method (§3.4), or incremental strategy (§4); either (A) finalize the
design by updating the header status to "Final" and attach explicit Finance
sign-off in the document before merging the implementation, or (B) remove the
implementation artifacts from this PR and split into two PRs — one that contains
the design-only for review/approval and a second PR with the implementation once
sign-off is obtained — or (C) clearly mark the implementation files as WIP and
block merging until sign-off is attached. Ensure references to §3.3, §3.4, §4,
and §5 are included in the approval note so reviewers can validate the specific
decisions that must be accepted before applying the SQL changes.

In `@models/orders/dw/refund_fact.sql`:
- Around line 1-4: The refund_fact model is missing schema tests for its primary
key refund_event_id; add a YAML table entry for the refund_fact model in the
models' schema file that defines the column refund_event_id with tests: - unique
and - not_null so the primary key is validated; ensure the entry includes name:
refund_fact, a short description, and a columns block listing refund_event_id
with the two tests.
- Around line 9-20: Convert the SELECT list to use leading commas (place the
comma at the start of each subsequent column line) and make SQL keywords
uppercase; specifically update the SELECT block that references {{
ref('stg_refunds') }} AS r (columns like refund_event_id, source,
source_refund_id, order_id, line_item_id, qty_refunded, refund_amount,
refunded_at, created_at_dwh, updated_at_dwh) so commas lead each column line and
replace current_timestamp with CURRENT_TIMESTAMP for created_at_dwh and
updated_at_dwh.
- Around line 18-19: The created_at_dwh column is being reset to
current_timestamp on every run; change its expression so incremental runs
preserve the existing value by using dbt's is_incremental() and selecting the
prior created_at_dwh from the target table ({{ this }}) by the model's unique
key, falling back to current_timestamp if none exists; keep updated_at_dwh as
current_timestamp. Use the created_at_dwh and updated_at_dwh column names and
the model's unique key (the same unique_key used for incremental logic) to
implement a conditional/subquery lookup that returns the prior created_at_dwh
when is_incremental() is true.

In `@models/orders/staging/stg_refunds.sql`:
- Around line 59-64: Convert the SELECT list to use leading commas and make SQL
type/keyword names uppercase: move commas to the start of lines before order_id,
amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS
refunded_at, and DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS
refunded_at_minute; also change type/keyword casing such as CAST(... AS
timestamp) → CAST(... AS TIMESTAMP) and DATE_TRUNC → DATE_TRUNC and ensure AS is
uppercase; the clause referencing ref('base_refunds_internal_pos') should remain
unchanged.
- Around line 28-35: Convert the SELECT block to use leading commas and
uppercase SQL keywords: change the column list so each column line begins with a
leading comma (e.g., ", order_id") and make keywords/types uppercase (SELECT,
AS, CAST, DATE_TRUNC, TIMESTAMP) and ensure CAST(processed_at AS TIMESTAMP) and
DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) follow that casing; keep
the same source reference {{ ref('base_refunds_stripe') }} and column aliases
(source_refund_id, refund_amount, refunded_at, refunded_at_minute) unchanged.
- Around line 68-108: The SQL uses trailing commas and lowercase type names;
update the SELECT lists in the shopify, pos, and stripe (alias se) blocks to use
leading commas before each column and change type names in CASTs to uppercase
(e.g., CAST(NULL AS VARCHAR), CAST(NULL AS BIGINT)); ensure the UNION ALL
sections and JOIN clause remain functionally identical while applying the
leading-comma formatting to every SELECT projection (including the 'source'
literal) and capitalizing SQL keywords where needed.
- Around line 16-24: The SELECT uses trailing commas and lowercase SQL type
names; switch to leading commas for each column in the SELECT list (e.g., for
refund_id AS source_refund_id, order_id, line_item_id, qty_refunded,
amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS
refunded_at, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS
refunded_at_minute) and change the CAST type identifier to uppercase
(CAST(refunded_at AS TIMESTAMP)) so both CAST(...) occurrences use TIMESTAMP and
all commas are leading per style guidelines.

---

Nitpick comments:
In `@models/orders/dw/order_fact.sql`:
- Around line 58-61: The SELECT mixes COALESCE for os.shipment_count but leaves
orev.line_count, orev.quantity_ordered (total_quantity), and orev.revenue
nullable; make NULL handling consistent by wrapping orev.line_count,
orev.quantity_ordered and orev.revenue with COALESCE (e.g.,
COALESCE(orev.line_count, 0), COALESCE(orev.quantity_ordered, 0),
COALESCE(orev.revenue, 0.0)) so orders without line items return zeros like
os.shipment_count; update the expressions that reference orev.line_count,
orev.quantity_ordered, and orev.revenue accordingly.

In `@models/orders/dw/orders_dw.yml`:
- Around line 26-27: The review notes that the orders_dw model's revenue column
(symbol: revenue, doc: order_fact_revenue) lacks a not_null test; either add a
not_null test under the revenue column in models/orders/dw/orders_dw.yml
(columns: revenue -> tests: - not_null) so dbt enforces non-null revenue, or if
revenue can be legitimately NULL, update the revenue description
(order_fact_revenue) to document that nulls are expected and create a tracking
issue referencing this decision instead of merging without a test.

In `@tests/stg_refunds_no_shopify_stripe_collision.sql`:
- Around line 6-22: The per_source CTE is redundant because it simply SELECTs
and GROUPs the same columns; remove the CTE and query directly from {{
ref('stg_refunds') }} filtering source IN ('shopify','stripe') — i.e. in the
main query use FROM {{ ref('stg_refunds') }} WHERE source IN (...) and then
GROUP BY order_id, refunded_at_minute and HAVING count(DISTINCT source) > 1;
keep references to order_id and refunded_at_minute and the count(DISTINCT
source) aggregation as in the current outer query.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4100f806-8d91-40ba-9153-a26333a3d732

📥 Commits

Reviewing files that changed from the base of the PR and between 496c141 and f02c459.

📒 Files selected for processing (19)
  • CLAUDE.md
  • docs/designs/2026-Q2-refunds-modeling.md
  • models/orders/base/base_refunds_internal_pos.sql
  • models/orders/base/base_refunds_shopify.sql
  • models/orders/base/base_refunds_stripe.sql
  • models/orders/base/orders_base.yml
  • models/orders/dw/order_fact.sql
  • models/orders/dw/order_line_fact.sql
  • models/orders/dw/orders_dw.yml
  • models/orders/dw/refund_fact.sql
  • models/orders/staging/orders_staging.yml
  • models/orders/staging/stg_refund_tenders.sql
  • models/orders/staging/stg_refunds.sql
  • tests/order_fact_net_revenue_definition.sql
  • tests/order_fact_refund_over_revenue_warn.sql
  • tests/order_fact_refund_total_matches_refund_fact.sql
  • tests/order_line_fact_refund_sums_to_order.sql
  • tests/refund_fact_amount_positive.sql
  • tests/stg_refunds_no_shopify_stripe_collision.sql

Comment thread CLAUDE.md
Comment on lines +66 to +67
- **Refunds are intentionally not modeled.** `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`.
- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the refunds documentation to reflect new modeling.

Lines 66-67 state that "Refunds are intentionally not modeled" and cite the 2024-Q3 design as justification. However, this PR introduces comprehensive refunds modeling via refund_fact, order_fact refund aggregates, and order_line_fact refund attribution. This documentation is now contradictory and will mislead future developers.

📝 Proposed fix to update the documentation
-Per `docs/designs/2024-Q3-orders-redesign.md`:
-- **Refunds are intentionally not modeled.** `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`.
-- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.
+Per design decisions:
+- **Refunds are modeled on `refund_fact`** (see `docs/designs/2026-Q2-refunds-modeling.md`). Refund aggregates (`refund_total`, `net_revenue`) appear on `order_fact` and `order_line_fact`, but refund event details live on the separate `refund_fact` table.
+- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@CLAUDE.md` around lines 66 - 67, The docs currently state "Refunds are
intentionally not modeled" but the PR adds comprehensive refund modeling; update
CLAUDE.md to remove or replace that bullet and explicitly document the new
refund design: state that refunds are modeled via refund_fact, that order-level
refund aggregates are stored on order_fact, and that order_line_fact contains
refund attribution; also note which raw/refunds_*.csv inputs are consumed and
any relevant transformation or join keys used for refund_fact so future
developers aren't misled.

Comment on lines +1 to +14
# Refunds modeling — net revenue for `order_fact` / `order_line_fact`

**Author:** Data Engineering
**Status:** Draft — for review with Finance Analytics
**Date:** 2026-05-20
**Related:** [`2024-Q3-orders-redesign.md`](2024-Q3-orders-redesign.md) (out-of-scope item: refunds), DATA-145 (refund sources, not yet scoped), DATA-123 (Q1 revenue reconciliation — gross-revenue bug, separate issue)

---

## 1. What Finance asked for

> Surface refund totals on `order_fact` and per-line refund amounts on `order_line_fact` so we can report **net revenue** (gross − refunds).

This document captures the modeling decisions we recommend. We want sign-off on the decisions in §3 and §5 before we touch any SQL.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Resolve design approval status before merging implementation.

The document is marked as "Draft — for review with Finance Analytics" (line 4) and Section 5 explicitly requests Finance sign-off on critical decisions "before we touch any SQL" (line 14). However, this PR includes the full implementation (base models, staging, facts, tests) alongside the draft design.

Consider either:

  • Finalizing the design and obtaining Finance sign-off before merging, or
  • Splitting this PR into (1) design-only for review/approval, then (2) implementation after approval.

The current approach creates risk: if Finance requests changes to the dedup heuristic (§3.3), allocation method (§3.4), or incremental strategy (§4), the implemented SQL would need rework.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/designs/2026-Q2-refunds-modeling.md` around lines 1 - 14, The draft
design "Refunds modeling — net revenue for `order_fact` / `order_line_fact`" is
marked Draft and requests Finance sign-off in §5, but this PR already contains
the implementation (base models, staging, facts, tests) which risks rework if
Finance changes the dedup heuristic (§3.3), allocation method (§3.4), or
incremental strategy (§4); either (A) finalize the design by updating the header
status to "Final" and attach explicit Finance sign-off in the document before
merging the implementation, or (B) remove the implementation artifacts from this
PR and split into two PRs — one that contains the design-only for
review/approval and a second PR with the implementation once sign-off is
obtained — or (C) clearly mark the implementation files as WIP and block merging
until sign-off is attached. Ensure references to §3.3, §3.4, §4, and §5 are
included in the approval note so reviewers can validate the specific decisions
that must be accepted before applying the SQL changes.

Comment on lines +1 to +4
{{ config(
materialized='incremental',
unique_key='refund_event_id'
) }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add unique and not_null tests on the primary key.

The guideline requires every fact table to have unique and not_null tests on its primary key. The primary key refund_event_id needs these tests defined in a corresponding YAML schema file (e.g., models/orders/dw/orders_dw.yml).

📋 Required YAML configuration

Add to models/orders/dw/orders_dw.yml:

- name: refund_fact
  description: One row per refund event at order × line × source grain.
  columns:
    - name: refund_event_id
      description: Surrogate PK derived from (source, source_refund_id, line_item_id).
      tests:
        - unique
        - not_null

As per coding guidelines: Every fact table must have unique and not_null tests on its primary key.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/refund_fact.sql` around lines 1 - 4, The refund_fact model
is missing schema tests for its primary key refund_event_id; add a YAML table
entry for the refund_fact model in the models' schema file that defines the
column refund_event_id with tests: - unique and - not_null so the primary key is
validated; ensure the entry includes name: refund_fact, a short description, and
a columns block listing refund_event_id with the two tests.

Comment on lines +9 to +20
SELECT
r.refund_event_id
, r.source
, r.source_refund_id
, r.order_id
, r.line_item_id
, r.qty_refunded
, r.refund_amount
, r.refunded_at
, current_timestamp AS created_at_dwh
, current_timestamp AS updated_at_dwh
FROM {{ ref('stg_refunds') }} AS r
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase keywords per SQL style guideline.

The SELECT clause uses trailing commas and lowercase current_timestamp keyword.

✏️ Proposed fix
 SELECT
-    r.refund_event_id
-    , r.source
-    , r.source_refund_id
-    , r.order_id
-    , r.line_item_id
-    , r.qty_refunded
-    , r.refund_amount
-    , r.refunded_at
-    , current_timestamp AS created_at_dwh
-    , current_timestamp AS updated_at_dwh
+    r.refund_event_id
+    , r.source
+    , r.source_refund_id
+    , r.order_id
+    , r.line_item_id
+    , r.qty_refunded
+    , r.refund_amount
+    , r.refunded_at
+    , CURRENT_TIMESTAMP AS created_at_dwh
+    , CURRENT_TIMESTAMP AS updated_at_dwh
 FROM {{ ref('stg_refunds') }} AS r

As per coding guidelines: use leading commas, lowercase identifiers, and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
r.refund_event_id
, r.source
, r.source_refund_id
, r.order_id
, r.line_item_id
, r.qty_refunded
, r.refund_amount
, r.refunded_at
, current_timestamp AS created_at_dwh
, current_timestamp AS updated_at_dwh
FROM {{ ref('stg_refunds') }} AS r
SELECT
r.refund_event_id
, r.source
, r.source_refund_id
, r.order_id
, r.line_item_id
, r.qty_refunded
, r.refund_amount
, r.refunded_at
, CURRENT_TIMESTAMP AS created_at_dwh
, CURRENT_TIMESTAMP AS updated_at_dwh
FROM {{ ref('stg_refunds') }} AS r
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/refund_fact.sql` around lines 9 - 20, Convert the SELECT
list to use leading commas (place the comma at the start of each subsequent
column line) and make SQL keywords uppercase; specifically update the SELECT
block that references {{ ref('stg_refunds') }} AS r (columns like
refund_event_id, source, source_refund_id, order_id, line_item_id, qty_refunded,
refund_amount, refunded_at, created_at_dwh, updated_at_dwh) so commas lead each
column line and replace current_timestamp with CURRENT_TIMESTAMP for
created_at_dwh and updated_at_dwh.

Comment on lines +18 to +19
, current_timestamp AS created_at_dwh
, current_timestamp AS updated_at_dwh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve created_at_dwh on incremental updates.

In the current implementation, created_at_dwh is set to CURRENT_TIMESTAMP on every run. For an incremental model with unique_key, this means created_at_dwh will be overwritten when existing rows are updated (e.g., when the same refund event appears in the incremental window). The audit timestamp should preserve the original creation time.

🔧 Proposed fix to preserve created_at_dwh
 SELECT
     r.refund_event_id
     , r.source
     , r.source_refund_id
     , r.order_id
     , r.line_item_id
     , r.qty_refunded
     , r.refund_amount
     , r.refunded_at
-    , CURRENT_TIMESTAMP AS created_at_dwh
+    {% if is_incremental() %}
+    , COALESCE(target.created_at_dwh, CURRENT_TIMESTAMP) AS created_at_dwh
+    {% else %}
+    , CURRENT_TIMESTAMP AS created_at_dwh
+    {% endif %}
     , CURRENT_TIMESTAMP AS updated_at_dwh
 FROM {{ ref('stg_refunds') }} AS r
+{% if is_incremental() %}
+LEFT JOIN {{ this }} AS target
+    ON r.refund_event_id = target.refund_event_id
+WHERE r.refunded_at >= {{ get_incremental_value('refunded_at') }}
+{% endif %}
-{% if is_incremental() %}
-    WHERE r.refunded_at >= {{ get_incremental_value('refunded_at') }}
-{% endif %}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/refund_fact.sql` around lines 18 - 19, The created_at_dwh
column is being reset to current_timestamp on every run; change its expression
so incremental runs preserve the existing value by using dbt's is_incremental()
and selecting the prior created_at_dwh from the target table ({{ this }}) by the
model's unique key, falling back to current_timestamp if none exists; keep
updated_at_dwh as current_timestamp. Use the created_at_dwh and updated_at_dwh
column names and the model's unique key (the same unique_key used for
incremental logic) to implement a conditional/subquery lookup that returns the
prior created_at_dwh when is_incremental() is true.

Comment on lines +16 to +24
SELECT
refund_id AS source_refund_id
, order_id
, line_item_id
, qty_refunded
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_shopify') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

This SELECT clause uses trailing commas. Additionally, ensure SQL type names in CAST expressions are uppercase.

✏️ Proposed fix
     SELECT
-        refund_id AS source_refund_id
-        , order_id
-        , line_item_id
-        , qty_refunded
-        , amount_in_cents / 100.0 AS refund_amount
-        , CAST(refunded_at AS timestamp) AS refunded_at
-        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+        refund_id AS source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS TIMESTAMP) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute
     FROM {{ ref('base_refunds_shopify') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
refund_id AS source_refund_id
, order_id
, line_item_id
, qty_refunded
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_shopify') }}
SELECT
refund_id AS source_refund_id
, order_id
, line_item_id
, qty_refunded
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS TIMESTAMP) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute
FROM {{ ref('base_refunds_shopify') }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/staging/stg_refunds.sql` around lines 16 - 24, The SELECT uses
trailing commas and lowercase SQL type names; switch to leading commas for each
column in the SELECT list (e.g., for refund_id AS source_refund_id, order_id,
line_item_id, qty_refunded, amount_in_cents / 100.0 AS refund_amount,
CAST(refunded_at AS timestamp) AS refunded_at, DATE_TRUNC('minute',
CAST(refunded_at AS timestamp)) AS refunded_at_minute) and change the CAST type
identifier to uppercase (CAST(refunded_at AS TIMESTAMP)) so both CAST(...)
occurrences use TIMESTAMP and all commas are leading per style guidelines.

Comment on lines +28 to +35
SELECT
refund_id AS source_refund_id
, order_id
, tender_type
, amount_in_cents / 100.0 AS refund_amount
, CAST(processed_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_stripe') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix
     SELECT
-        refund_id AS source_refund_id
-        , order_id
-        , tender_type
-        , amount_in_cents / 100.0 AS refund_amount
-        , CAST(processed_at AS timestamp) AS refunded_at
-        , DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
+        refund_id AS source_refund_id
+        , order_id
+        , tender_type
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(processed_at AS TIMESTAMP) AS refunded_at
+        , DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) AS refunded_at_minute
     FROM {{ ref('base_refunds_stripe') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
refund_id AS source_refund_id
, order_id
, tender_type
, amount_in_cents / 100.0 AS refund_amount
, CAST(processed_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_stripe') }}
SELECT
refund_id AS source_refund_id
, order_id
, tender_type
, amount_in_cents / 100.0 AS refund_amount
, CAST(processed_at AS TIMESTAMP) AS refunded_at
, DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) AS refunded_at_minute
FROM {{ ref('base_refunds_stripe') }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/staging/stg_refunds.sql` around lines 28 - 35, Convert the
SELECT block to use leading commas and uppercase SQL keywords: change the column
list so each column line begins with a leading comma (e.g., ", order_id") and
make keywords/types uppercase (SELECT, AS, CAST, DATE_TRUNC, TIMESTAMP) and
ensure CAST(processed_at AS TIMESTAMP) and DATE_TRUNC('minute',
CAST(processed_at AS TIMESTAMP)) follow that casing; keep the same source
reference {{ ref('base_refunds_stripe') }} and column aliases (source_refund_id,
refund_amount, refunded_at, refunded_at_minute) unchanged.

Comment on lines +59 to +64
refund_id AS source_refund_id
, order_id
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_internal_pos') }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix
     SELECT
-        refund_id AS source_refund_id
-        , order_id
-        , amount_in_cents / 100.0 AS refund_amount
-        , CAST(refunded_at AS timestamp) AS refunded_at
-        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+        refund_id AS source_refund_id
+        , order_id
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS TIMESTAMP) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute
     FROM {{ ref('base_refunds_internal_pos') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
refund_id AS source_refund_id
, order_id
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS timestamp) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
FROM {{ ref('base_refunds_internal_pos') }}
refund_id AS source_refund_id
, order_id
, amount_in_cents / 100.0 AS refund_amount
, CAST(refunded_at AS TIMESTAMP) AS refunded_at
, DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute
FROM {{ ref('base_refunds_internal_pos') }}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/staging/stg_refunds.sql` around lines 59 - 64, Convert the
SELECT list to use leading commas and make SQL type/keyword names uppercase:
move commas to the start of lines before order_id, amount_in_cents / 100.0 AS
refund_amount, CAST(refunded_at AS timestamp) AS refunded_at, and
DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute; also
change type/keyword casing such as CAST(... AS timestamp) → CAST(... AS
TIMESTAMP) and DATE_TRUNC → DATE_TRUNC and ensure AS is uppercase; the clause
referencing ref('base_refunds_internal_pos') should remain unchanged.

Comment on lines +68 to +108
SELECT
'shopify' AS source
, source_refund_id
, order_id
, line_item_id
, qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM shopify

UNION ALL

SELECT
'internal_pos' AS source
, source_refund_id
, order_id
, CAST(NULL AS varchar) AS line_item_id
, CAST(NULL AS bigint) AS qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM pos

UNION ALL

-- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
SELECT
'stripe' AS source
, se.source_refund_id
, se.order_id
, CAST(NULL AS varchar) AS line_item_id
, CAST(NULL AS bigint) AS qty_refunded
, se.refund_amount
, se.refunded_at
, se.refunded_at_minute
FROM stripe_events AS se
LEFT JOIN shopify_keys AS sk
ON se.order_id = sk.order_id
AND se.refunded_at_minute = sk.refunded_at_minute
WHERE sk.order_id IS NULL
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix
     SELECT
-        'shopify' AS source
-        , source_refund_id
-        , order_id
-        , line_item_id
-        , qty_refunded
-        , refund_amount
-        , refunded_at
-        , refunded_at_minute
+        'shopify' AS source
+        , source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
     FROM shopify
 
     UNION ALL
 
     SELECT
-        'internal_pos' AS source
-        , source_refund_id
-        , order_id
-        , CAST(NULL AS varchar) AS line_item_id
-        , CAST(NULL AS bigint) AS qty_refunded
-        , refund_amount
-        , refunded_at
-        , refunded_at_minute
+        'internal_pos' AS source
+        , source_refund_id
+        , order_id
+        , CAST(NULL AS VARCHAR) AS line_item_id
+        , CAST(NULL AS BIGINT) AS qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
     FROM pos
 
     UNION ALL
 
     -- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
     SELECT
-        'stripe' AS source
-        , se.source_refund_id
-        , se.order_id
-        , CAST(NULL AS varchar) AS line_item_id
-        , CAST(NULL AS bigint) AS qty_refunded
-        , se.refund_amount
-        , se.refunded_at
-        , se.refunded_at_minute
+        'stripe' AS source
+        , se.source_refund_id
+        , se.order_id
+        , CAST(NULL AS VARCHAR) AS line_item_id
+        , CAST(NULL AS BIGINT) AS qty_refunded
+        , se.refund_amount
+        , se.refunded_at
+        , se.refunded_at_minute
     FROM stripe_events AS se

As per coding guidelines: use leading commas and uppercase keywords (including type names) in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SELECT
'shopify' AS source
, source_refund_id
, order_id
, line_item_id
, qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM shopify
UNION ALL
SELECT
'internal_pos' AS source
, source_refund_id
, order_id
, CAST(NULL AS varchar) AS line_item_id
, CAST(NULL AS bigint) AS qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM pos
UNION ALL
-- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
SELECT
'stripe' AS source
, se.source_refund_id
, se.order_id
, CAST(NULL AS varchar) AS line_item_id
, CAST(NULL AS bigint) AS qty_refunded
, se.refund_amount
, se.refunded_at
, se.refunded_at_minute
FROM stripe_events AS se
LEFT JOIN shopify_keys AS sk
ON se.order_id = sk.order_id
AND se.refunded_at_minute = sk.refunded_at_minute
WHERE sk.order_id IS NULL
SELECT
'shopify' AS source
, source_refund_id
, order_id
, line_item_id
, qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM shopify
UNION ALL
SELECT
'internal_pos' AS source
, source_refund_id
, order_id
, CAST(NULL AS VARCHAR) AS line_item_id
, CAST(NULL AS BIGINT) AS qty_refunded
, refund_amount
, refunded_at
, refunded_at_minute
FROM pos
UNION ALL
-- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
SELECT
'stripe' AS source
, se.source_refund_id
, se.order_id
, CAST(NULL AS VARCHAR) AS line_item_id
, CAST(NULL AS BIGINT) AS qty_refunded
, se.refund_amount
, se.refunded_at
, se.refunded_at_minute
FROM stripe_events AS se
LEFT JOIN shopify_keys AS sk
ON se.order_id = sk.order_id
AND se.refunded_at_minute = sk.refunded_at_minute
WHERE sk.order_id IS NULL
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/staging/stg_refunds.sql` around lines 68 - 108, The SQL uses
trailing commas and lowercase type names; update the SELECT lists in the
shopify, pos, and stripe (alias se) blocks to use leading commas before each
column and change type names in CASTs to uppercase (e.g., CAST(NULL AS VARCHAR),
CAST(NULL AS BIGINT)); ensure the UNION ALL sections and JOIN clause remain
functionally identical while applying the leading-comma formatting to every
SELECT projection (including the 'source' literal) and capitalizing SQL keywords
where needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant