[DATA-123] Fix order_fact revenue + [DATA-145] add refunds modeling by tiwaryanuj · Pull Request #8 · helloextend/poc-data-engineering-coding-challenge

tiwaryanuj · 2026-05-20T22:25:27Z

Summary

Fix DATA-123 — order_fact.revenue was understating non-test Q1 revenue by $2,467,627.97 vs the Stripe captures number Finance gave us ($10,522,258.04 reported vs $12,989,886.01 truth). Two independent root causes; both fixed.
Implement DATA-145 — wire the three refund sources into the warehouse and surface net_revenue / refund_total on order_fact per the design doc added in this branch (docs/designs/2026-Q2-refunds-modeling.md). Pending Finance sign-off on §5 of that doc but the model is in place and tested.

Two commits, each scoped to one ticket. Easier to revert independently if needed.

DATA-123 root cause

metric	non-test revenue	Δ vs Stripe
Stripe captures (per ticket)	$12,989,886.01	—
`sum(stg_line_items.quantity × unit_price)` (canonical)	$12,989,886.01	$0.00
Before fix — `order_fact.revenue`	$10,522,258.04	−$2,467,627.97
After fix — `order_fact.revenue`	$12,989,886.01	$0.00

The $2.47M gap split into two independent bugs:

QUALIFY row_number() OVER (PARTITION BY order_id ORDER BY shipped_at) = 1 kept only the first shipment per order and silently dropped revenue from later shipments. 847 of 8,548 non-test orders are multi-shipment. Cost: −$747,890.76.
revenue was computed from quantity_shipped × unit_price at shipment-line grain. Partially-shipped, partially-cancelled, and not-yet-shipped orders lost the un-shipped portion. Cost: −$1,719,737.21 (this bug would have remained even with QUALIFY fixed).

Fix: aggregate stg_line_items to order grain first (sum(quantity × unit_price)) — no shipment fan-out, no QUALIFY needed. Shipment metadata attached as a sidecar (shipment_count, first_shipped_at). Also fixes the incremental watermark, which was filtering ordered_at >= max(updated_at_dwh) — column mismatch. Now filters and watermarks both on ordered_at.

DATA-145 (refunds) — what landed

Implements the proposal in docs/designs/2026-Q2-refunds-modeling.md. Key decisions captured there, not repeated here. High level:

New refund_fact (grain: order × line × refund_event), incremental on refunded_at.
order_fact gains refund_total, refund_count, last_refunded_at, net_revenue, is_fully_refunded.
order_line_fact gains refund_amount, qty_refunded, refund_allocation_method (direct / pro_rata).
Shopify-vs-Stripe dedup via (order_id, refunded_at_minute) heuristic (§3.3 — flagged as needing Finance confirmation if a gateway cross-walk exists upstream).
Late-arriving refund handling on order_fact: incremental filter is union of ordered_at >= watermark OR order_id touched by refunds since the previous run (§4 — the production gotcha).

Validation

make full   → 23/23 OK
make test   → 41/41 PASS

sum(order_fact.revenue) where not is_test = $12,989,886.01 (ties to Stripe captures to the penny).
count(*) == count(distinct order_id) == 9699 — one row per order.
New refunds tests pass: row-level invariant sum(refund_fact.amount) per order == order_fact.refund_total; allocated line-level sums tie to order_fact.refund_total within $0.01 (pro-rata rounding); no (order_id, minute) collision across Shopify+Stripe in stg_refunds.

Test plan

Review models/orders/dw/order_fact.sql — confirm the new grain logic + the incremental filter union (ordered_at OR refund-touched).
Review docs/designs/2026-Q2-refunds-modeling.md §3.3 (Shopify-Stripe dedup) and §5 (open questions for Finance) before merging the DATA-145 portion.
Spot-check 3–5 multi-shipment orders from the top of the impacted list (O001276, O007967, O002332) against the new order_fact.revenue.
Diff daily_revenue pre- vs post-fix; the per-day step-up should be material — heads-up to Jamie before deploy.
After merge: flip tests/order_fact_revenue_reconciliation.sql from severity='warn' to error. The warn was masking known-broken state per CONTRIBUTING.md.

Out of scope (intentionally)

Type 2 history on lkp_merchants — same constraint as the Q3-2024 design doc.
Chargebacks — different upstream, different finance treatment (see refunds doc §10).
Tax / shipping refunds breakout — assumed gross-of-tax, matches how revenue is computed today.

Open items / follow-ups

Finance sign-off on refunds design doc §5 (Q's A–G), especially the Shopify-Stripe dedup heuristic in §3.3 and the goodwill-overrefund treatment in §5-B.
Add not_null schema test on order_fact.revenue (every order has line items, so this should always hold).
The daily_revenue reporting view will produce materially different numbers post-deploy — coordinate with Finance on the cutover date and expected Q1 delta.

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Refund data now modeled and tracked across multiple payment sources (Shopify, Stripe, internal POS)
- Orders and line items now include refund metrics and net revenue calculations
- Unified refund consolidation with deduplication across payment platforms
Documentation
- Added repository guidance documenting architecture, modeling conventions, and contribution standards
- Added design documentation detailing refund modeling approach and implementation plan

Root cause: order_fact had two independent bugs that combined to understate non-test Q1 revenue by $2,467,627.97 vs Stripe captures ($10,522,258.04 reported vs $12,989,886.01 truth). 1. The QUALIFY row_number() PARTITION BY order_id ORDER BY shipped_at = 1 kept only the first shipment per order and silently dropped revenue from 2nd+ shipments on 847 multi-shipment orders (-$747,890.76). 2. Revenue was sum(quantity_shipped * unit_price) at shipment-line grain. Partially-shipped, partially-cancelled, and not-yet-shipped orders lost the un-shipped portion (-$1,719,737.21). Fix: aggregate stg_line_items at order grain first (revenue = sum(quantity * unit_price), no shipment fan-out, no QUALIFY needed), attach shipment metadata as a sidecar (shipment_count, first_shipped_at). Also fixes the incremental watermark column mismatch (filter and watermark now both on ordered_at instead of ordered_at vs updated_at_dwh). Validated: make full + make test both clean (17/17 + 12/12). Non-test sum(revenue) ties to $12,989,886.01 to the penny; rows == distinct order_ids == 9699 confirms one-row-per-order grain. Also adds: - CLAUDE.md documenting repo layering, incremental pattern, and the prior bug zones so future work doesn't re-discover them. - docs/designs/2026-Q2-refunds-modeling.md: draft design for the refunds/net-revenue follow-on (DATA-145 territory), out-of-scope for this fix per the Q3-2024 design doc.

Implements the design proposal in docs/designs/2026-Q2-refunds-modeling.md that was added in the previous DATA-123 commit. Wires the three refund sources (Stripe, Shopify, Internal POS) into the warehouse, builds a refund_fact, and surfaces refund_total / net_revenue / is_fully_refunded on order_fact plus refund_amount + qty_refunded on order_line_fact. Key decisions (see design doc for full rationale): - refund_fact grain: order x line x refund_event. - Shopify-Stripe dedup via (order_id, refunded_at_minute) heuristic; Shopify wins on occurrence + line + amount, Stripe is tender sidecar. - Pro-rata allocation of order-grain refunds (POS, Stripe-direct) to lines, with explicit refund_allocation_method flag. - Late-arriving refund handling: order_fact incremental filter unions ordered_at >= watermark with orders touched by refunds since the last refund-side watermark. Tests added (row-level invariants per CONTRIBUTING.md): - refund_fact: unique/not_null on refund_event_id, amount > 0. - order_fact: per-order refund_total ties to sum(refund_fact.amount); net_revenue tautology guard; warn-only signal when refund > revenue. - order_line_fact: per-order sum(refund_amount) ties to order_fact refund_total within rounding. - stg_refunds: no (order_id, minute) collision across Shopify+Stripe. Validated: make full clean (23/23); make test clean (41/41). Open items pending Finance sign-off per design doc Sec.5 (e.g., gateway cross-walk vs heuristic, tax/shipping treatment). Tracked there, not blocking this commit.

coderabbitai · 2026-05-20T22:25:50Z

Walkthrough

This PR implements end-to-end refund modeling for net revenue computation. It adds three base models extracting refunds from Shopify, Stripe, and internal POS sources; a unified staging layer that deduplicates refunds across sources using Shopify precedence and minute-granularity timestamps; a refund_fact incremental table emitting one refund event per order × line × source; refund aggregates to order_fact with dual incremental logic on order date and refund recency; per-line refund attribution in order_line_fact with pro-rata allocation and rounding reconciliation; and nine tests validating accounting invariants and dedup logic. The design document specifies Finance requirements, dedup heuristics, allocation methods, and test expectations.

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 SQLFluff (4.2.1)

models/orders/dw/order_fact.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

models/orders/base/base_refunds_internal_pos.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

models/orders/base/base_refunds_shopify.sql

WARNING: Config file /.sqlfluff set a deprecated config value indentation:allow_implicit_indents. This will be removed in a later release. This has been mapped to indentation:implicit_indents set to a value of 'allow' for this run. Please update your configuration to remove this warning.

The allow_implicit_indents config has been replaced by implicit_indents. Use 'forbid' (was false), 'allow' (was true), or 'require' instead.

See https://docs.sqlfluff.com/en/stable/perma/configuration.html for more details.

Starting in sqlfluff version 0.7.0 the dbt templater is distributed as a separate python package. Please pip install sqlfluff-templater-dbt to use it.

11 others

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 9

🧹 Nitpick comments (3)

models/orders/dw/order_fact.sql (1)

58-61: 💤 Low value

Inconsistent NULL handling for orders with no line items.

shipment_count uses COALESCE (line 58), but line_count, total_quantity, and revenue do not. If an order exists in stg_orders without matching line items, these columns will be NULL while shipment_count defaults to 0.

Consider applying consistent COALESCE handling:

Proposed fix

     , coalesce(os.shipment_count, 0) AS shipment_count
-    , orev.line_count
-    , orev.quantity_ordered AS total_quantity
-    , orev.revenue
+    , coalesce(orev.line_count, 0) AS line_count
+    , coalesce(orev.quantity_ordered, 0) AS total_quantity
+    , coalesce(orev.revenue, 0) AS revenue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/order_fact.sql` around lines 58 - 61, The SELECT mixes
COALESCE for os.shipment_count but leaves orev.line_count, orev.quantity_ordered
(total_quantity), and orev.revenue nullable; make NULL handling consistent by
wrapping orev.line_count, orev.quantity_ordered and orev.revenue with COALESCE
(e.g., COALESCE(orev.line_count, 0), COALESCE(orev.quantity_ordered, 0),
COALESCE(orev.revenue, 0.0)) so orders without line items return zeros like
os.shipment_count; update the expressions that reference orev.line_count,
orev.quantity_ordered, and orev.revenue accordingly.

models/orders/dw/orders_dw.yml (1)

26-27: ⚡ Quick win

Missing not_null test on revenue (open item from PR objectives).

The PR objectives note this as an open item: "add not_null test on order_fact.revenue". If revenue can legitimately be NULL (orders with no line items), document the behavior; otherwise, add the test before merge or create a tracking issue.
Proposed fix if revenue should never be NULL
       - name: revenue
         description: '{{ doc("order_fact_revenue") }}'
+        tests:
+          - not_null
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/orders_dw.yml` around lines 26 - 27, The review notes that
the orders_dw model's revenue column (symbol: revenue, doc: order_fact_revenue)
lacks a not_null test; either add a not_null test under the revenue column in
models/orders/dw/orders_dw.yml (columns: revenue -> tests: - not_null) so dbt
enforces non-null revenue, or if revenue can be legitimately NULL, update the
revenue description (order_fact_revenue) to document that nulls are expected and
create a tracking issue referencing this decision instead of merging without a
test.

tests/stg_refunds_no_shopify_stripe_collision.sql (1)

6-22: ⚡ Quick win

Consider simplifying by removing the redundant CTE.

The per_source CTE groups by all selected columns without aggregation, making it equivalent to SELECT DISTINCT. Since the outer query still needs to aggregate by (order_id, refunded_at_minute), the CTE doesn't reduce data volume or isolate complex logic.

♻️ Simpler direct query

-WITH per_source AS (
-    SELECT
-        order_id
-        , refunded_at_minute
-        , source
-    FROM {{ ref('stg_refunds') }}
-    WHERE source IN ('shopify', 'stripe')
-    GROUP BY order_id, refunded_at_minute, source
-)
-
 SELECT
     order_id
     , refunded_at_minute
     , count(DISTINCT source) AS source_count
-FROM per_source
+FROM {{ ref('stg_refunds') }}
+WHERE source IN ('shopify', 'stripe')
 GROUP BY order_id, refunded_at_minute
 HAVING count(DISTINCT source) > 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/stg_refunds_no_shopify_stripe_collision.sql` around lines 6 - 22, The
per_source CTE is redundant because it simply SELECTs and GROUPs the same
columns; remove the CTE and query directly from {{ ref('stg_refunds') }}
filtering source IN ('shopify','stripe') — i.e. in the main query use FROM {{
ref('stg_refunds') }} WHERE source IN (...) and then GROUP BY order_id,
refunded_at_minute and HAVING count(DISTINCT source) > 1; keep references to
order_id and refunded_at_minute and the count(DISTINCT source) aggregation as in
the current outer query.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@CLAUDE.md`:
- Around line 66-67: The docs currently state "Refunds are intentionally not
modeled" but the PR adds comprehensive refund modeling; update CLAUDE.md to
remove or replace that bullet and explicitly document the new refund design:
state that refunds are modeled via refund_fact, that order-level refund
aggregates are stored on order_fact, and that order_line_fact contains refund
attribution; also note which raw/refunds_*.csv inputs are consumed and any
relevant transformation or join keys used for refund_fact so future developers
aren't misled.

In `@docs/designs/2026-Q2-refunds-modeling.md`:
- Around line 1-14: The draft design "Refunds modeling — net revenue for
`order_fact` / `order_line_fact`" is marked Draft and requests Finance sign-off
in §5, but this PR already contains the implementation (base models, staging,
facts, tests) which risks rework if Finance changes the dedup heuristic (§3.3),
allocation method (§3.4), or incremental strategy (§4); either (A) finalize the
design by updating the header status to "Final" and attach explicit Finance
sign-off in the document before merging the implementation, or (B) remove the
implementation artifacts from this PR and split into two PRs — one that contains
the design-only for review/approval and a second PR with the implementation once
sign-off is obtained — or (C) clearly mark the implementation files as WIP and
block merging until sign-off is attached. Ensure references to §3.3, §3.4, §4,
and §5 are included in the approval note so reviewers can validate the specific
decisions that must be accepted before applying the SQL changes.

In `@models/orders/dw/refund_fact.sql`:
- Around line 1-4: The refund_fact model is missing schema tests for its primary
key refund_event_id; add a YAML table entry for the refund_fact model in the
models' schema file that defines the column refund_event_id with tests: - unique
and - not_null so the primary key is validated; ensure the entry includes name:
refund_fact, a short description, and a columns block listing refund_event_id
with the two tests.
- Around line 9-20: Convert the SELECT list to use leading commas (place the
comma at the start of each subsequent column line) and make SQL keywords
uppercase; specifically update the SELECT block that references {{
ref('stg_refunds') }} AS r (columns like refund_event_id, source,
source_refund_id, order_id, line_item_id, qty_refunded, refund_amount,
refunded_at, created_at_dwh, updated_at_dwh) so commas lead each column line and
replace current_timestamp with CURRENT_TIMESTAMP for created_at_dwh and
updated_at_dwh.
- Around line 18-19: The created_at_dwh column is being reset to
current_timestamp on every run; change its expression so incremental runs
preserve the existing value by using dbt's is_incremental() and selecting the
prior created_at_dwh from the target table ({{ this }}) by the model's unique
key, falling back to current_timestamp if none exists; keep updated_at_dwh as
current_timestamp. Use the created_at_dwh and updated_at_dwh column names and
the model's unique key (the same unique_key used for incremental logic) to
implement a conditional/subquery lookup that returns the prior created_at_dwh
when is_incremental() is true.

In `@models/orders/staging/stg_refunds.sql`:
- Around line 59-64: Convert the SELECT list to use leading commas and make SQL
type/keyword names uppercase: move commas to the start of lines before order_id,
amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS
refunded_at, and DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS
refunded_at_minute; also change type/keyword casing such as CAST(... AS
timestamp) → CAST(... AS TIMESTAMP) and DATE_TRUNC → DATE_TRUNC and ensure AS is
uppercase; the clause referencing ref('base_refunds_internal_pos') should remain
unchanged.
- Around line 28-35: Convert the SELECT block to use leading commas and
uppercase SQL keywords: change the column list so each column line begins with a
leading comma (e.g., ", order_id") and make keywords/types uppercase (SELECT,
AS, CAST, DATE_TRUNC, TIMESTAMP) and ensure CAST(processed_at AS TIMESTAMP) and
DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) follow that casing; keep
the same source reference {{ ref('base_refunds_stripe') }} and column aliases
(source_refund_id, refund_amount, refunded_at, refunded_at_minute) unchanged.
- Around line 68-108: The SQL uses trailing commas and lowercase type names;
update the SELECT lists in the shopify, pos, and stripe (alias se) blocks to use
leading commas before each column and change type names in CASTs to uppercase
(e.g., CAST(NULL AS VARCHAR), CAST(NULL AS BIGINT)); ensure the UNION ALL
sections and JOIN clause remain functionally identical while applying the
leading-comma formatting to every SELECT projection (including the 'source'
literal) and capitalizing SQL keywords where needed.
- Around line 16-24: The SELECT uses trailing commas and lowercase SQL type
names; switch to leading commas for each column in the SELECT list (e.g., for
refund_id AS source_refund_id, order_id, line_item_id, qty_refunded,
amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS
refunded_at, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS
refunded_at_minute) and change the CAST type identifier to uppercase
(CAST(refunded_at AS TIMESTAMP)) so both CAST(...) occurrences use TIMESTAMP and
all commas are leading per style guidelines.

---

Nitpick comments:
In `@models/orders/dw/order_fact.sql`:
- Around line 58-61: The SELECT mixes COALESCE for os.shipment_count but leaves
orev.line_count, orev.quantity_ordered (total_quantity), and orev.revenue
nullable; make NULL handling consistent by wrapping orev.line_count,
orev.quantity_ordered and orev.revenue with COALESCE (e.g.,
COALESCE(orev.line_count, 0), COALESCE(orev.quantity_ordered, 0),
COALESCE(orev.revenue, 0.0)) so orders without line items return zeros like
os.shipment_count; update the expressions that reference orev.line_count,
orev.quantity_ordered, and orev.revenue accordingly.

In `@models/orders/dw/orders_dw.yml`:
- Around line 26-27: The review notes that the orders_dw model's revenue column
(symbol: revenue, doc: order_fact_revenue) lacks a not_null test; either add a
not_null test under the revenue column in models/orders/dw/orders_dw.yml
(columns: revenue -> tests: - not_null) so dbt enforces non-null revenue, or if
revenue can be legitimately NULL, update the revenue description
(order_fact_revenue) to document that nulls are expected and create a tracking
issue referencing this decision instead of merging without a test.

In `@tests/stg_refunds_no_shopify_stripe_collision.sql`:
- Around line 6-22: The per_source CTE is redundant because it simply SELECTs
and GROUPs the same columns; remove the CTE and query directly from {{
ref('stg_refunds') }} filtering source IN ('shopify','stripe') — i.e. in the
main query use FROM {{ ref('stg_refunds') }} WHERE source IN (...) and then
GROUP BY order_id, refunded_at_minute and HAVING count(DISTINCT source) > 1;
keep references to order_id and refunded_at_minute and the count(DISTINCT
source) aggregation as in the current outer query.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4100f806-8d91-40ba-9153-a26333a3d732

📥 Commits

Reviewing files that changed from the base of the PR and between 496c141 and f02c459.

📒 Files selected for processing (19)

CLAUDE.md
docs/designs/2026-Q2-refunds-modeling.md
models/orders/base/base_refunds_internal_pos.sql
models/orders/base/base_refunds_shopify.sql
models/orders/base/base_refunds_stripe.sql
models/orders/base/orders_base.yml
models/orders/dw/order_fact.sql
models/orders/dw/order_line_fact.sql
models/orders/dw/orders_dw.yml
models/orders/dw/refund_fact.sql
models/orders/staging/orders_staging.yml
models/orders/staging/stg_refund_tenders.sql
models/orders/staging/stg_refunds.sql
tests/order_fact_net_revenue_definition.sql
tests/order_fact_refund_over_revenue_warn.sql
tests/order_fact_refund_total_matches_refund_fact.sql
tests/order_line_fact_refund_sums_to_order.sql
tests/refund_fact_amount_positive.sql
tests/stg_refunds_no_shopify_stripe_collision.sql

coderabbitai · 2026-05-20T22:31:57Z

+- **Refunds are intentionally not modeled.** `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`.
+- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Update the refunds documentation to reflect new modeling.

Lines 66-67 state that "Refunds are intentionally not modeled" and cite the 2024-Q3 design as justification. However, this PR introduces comprehensive refunds modeling via refund_fact, order_fact refund aggregates, and order_line_fact refund attribution. This documentation is now contradictory and will mislead future developers.

📝 Proposed fix to update the documentation

-Per `docs/designs/2024-Q3-orders-redesign.md`: -- **Refunds are intentionally not modeled.** `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`. -- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot. +Per design decisions: +- **Refunds are modeled on `refund_fact`** (see `docs/designs/2026-Q2-refunds-modeling.md`). Refund aggregates (`refund_total`, `net_revenue`) appear on `order_fact` and `order_line_fact`, but refund event details live on the separate `refund_fact` table. +- **Merchants are current-state only** (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@CLAUDE.md` around lines 66 - 67, The docs currently state "Refunds are intentionally not modeled" but the PR adds comprehensive refund modeling; update CLAUDE.md to remove or replace that bullet and explicitly document the new refund design: state that refunds are modeled via refund_fact, that order-level refund aggregates are stored on order_fact, and that order_line_fact contains refund attribution; also note which raw/refunds_*.csv inputs are consumed and any relevant transformation or join keys used for refund_fact so future developers aren't misled.

coderabbitai · 2026-05-20T22:31:57Z

+# Refunds modeling — net revenue for `order_fact` / `order_line_fact`
+
+**Author:** Data Engineering
+**Status:** Draft — for review with Finance Analytics
+**Date:** 2026-05-20
+**Related:** [`2024-Q3-orders-redesign.md`](2024-Q3-orders-redesign.md) (out-of-scope item: refunds), DATA-145 (refund sources, not yet scoped), DATA-123 (Q1 revenue reconciliation — gross-revenue bug, separate issue)
+
+---
+
+## 1. What Finance asked for
+
+> Surface refund totals on `order_fact` and per-line refund amounts on `order_line_fact` so we can report **net revenue** (gross − refunds).
+
+This document captures the modeling decisions we recommend. We want sign-off on the decisions in §3 and §5 before we touch any SQL.


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Resolve design approval status before merging implementation.

The document is marked as "Draft — for review with Finance Analytics" (line 4) and Section 5 explicitly requests Finance sign-off on critical decisions "before we touch any SQL" (line 14). However, this PR includes the full implementation (base models, staging, facts, tests) alongside the draft design.

Consider either:

Finalizing the design and obtaining Finance sign-off before merging, or

Splitting this PR into (1) design-only for review/approval, then (2) implementation after approval.

The current approach creates risk: if Finance requests changes to the dedup heuristic (§3.3), allocation method (§3.4), or incremental strategy (§4), the implemented SQL would need rework.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/designs/2026-Q2-refunds-modeling.md` around lines 1 - 14, The draft design "Refunds modeling — net revenue for `order_fact` / `order_line_fact`" is marked Draft and requests Finance sign-off in §5, but this PR already contains the implementation (base models, staging, facts, tests) which risks rework if Finance changes the dedup heuristic (§3.3), allocation method (§3.4), or incremental strategy (§4); either (A) finalize the design by updating the header status to "Final" and attach explicit Finance sign-off in the document before merging the implementation, or (B) remove the implementation artifacts from this PR and split into two PRs — one that contains the design-only for review/approval and a second PR with the implementation once sign-off is obtained — or (C) clearly mark the implementation files as WIP and block merging until sign-off is attached. Ensure references to §3.3, §3.4, §4, and §5 are included in the approval note so reviewers can validate the specific decisions that must be accepted before applying the SQL changes.

coderabbitai · 2026-05-20T22:31:57Z

+{{ config(
+    materialized='incremental',
+    unique_key='refund_event_id'
+) }}


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add unique and not_null tests on the primary key.

The guideline requires every fact table to have unique and not_null tests on its primary key. The primary key refund_event_id needs these tests defined in a corresponding YAML schema file (e.g., models/orders/dw/orders_dw.yml).

📋 Required YAML configuration

Add to models/orders/dw/orders_dw.yml:

- name: refund_fact description: One row per refund event at order × line × source grain. columns: - name: refund_event_id description: Surrogate PK derived from (source, source_refund_id, line_item_id). tests: - unique - not_null

As per coding guidelines: Every fact table must have unique and not_null tests on its primary key.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/dw/refund_fact.sql` around lines 1 - 4, The refund_fact model is missing schema tests for its primary key refund_event_id; add a YAML table entry for the refund_fact model in the models' schema file that defines the column refund_event_id with tests: - unique and - not_null so the primary key is validated; ensure the entry includes name: refund_fact, a short description, and a columns block listing refund_event_id with the two tests.

coderabbitai · 2026-05-20T22:31:57Z

+SELECT
+    r.refund_event_id
+    , r.source
+    , r.source_refund_id
+    , r.order_id
+    , r.line_item_id
+    , r.qty_refunded
+    , r.refund_amount
+    , r.refunded_at
+    , current_timestamp AS created_at_dwh
+    , current_timestamp AS updated_at_dwh
+FROM {{ ref('stg_refunds') }} AS r


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase keywords per SQL style guideline.

The SELECT clause uses trailing commas and lowercase current_timestamp keyword.

✏️ Proposed fix

SELECT - r.refund_event_id - , r.source - , r.source_refund_id - , r.order_id - , r.line_item_id - , r.qty_refunded - , r.refund_amount - , r.refunded_at - , current_timestamp AS created_at_dwh - , current_timestamp AS updated_at_dwh + r.refund_event_id + , r.source + , r.source_refund_id + , r.order_id + , r.line_item_id + , r.qty_refunded + , r.refund_amount + , r.refunded_at + , CURRENT_TIMESTAMP AS created_at_dwh + , CURRENT_TIMESTAMP AS updated_at_dwh FROM {{ ref('stg_refunds') }} AS r

As per coding guidelines: use leading commas, lowercase identifiers, and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

r.refund_event_id

, r.source

, r.source_refund_id

, r.order_id

, r.line_item_id

, r.qty_refunded

, r.refund_amount

, r.refunded_at

, current_timestamp AS created_at_dwh

, current_timestamp AS updated_at_dwh

FROM {{ ref('stg_refunds') }} AS r

SELECT

r.refund_event_id

, r.source

, r.source_refund_id

, r.order_id

, r.line_item_id

, r.qty_refunded

, r.refund_amount

, r.refunded_at

, CURRENT_TIMESTAMP AS created_at_dwh

, CURRENT_TIMESTAMP AS updated_at_dwh

FROM {{ ref('stg_refunds') }} AS r

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/dw/refund_fact.sql` around lines 9 - 20, Convert the SELECT list to use leading commas (place the comma at the start of each subsequent column line) and make SQL keywords uppercase; specifically update the SELECT block that references {{ ref('stg_refunds') }} AS r (columns like refund_event_id, source, source_refund_id, order_id, line_item_id, qty_refunded, refund_amount, refunded_at, created_at_dwh, updated_at_dwh) so commas lead each column line and replace current_timestamp with CURRENT_TIMESTAMP for created_at_dwh and updated_at_dwh.

coderabbitai · 2026-05-20T22:31:57Z

+    , current_timestamp AS created_at_dwh
+    , current_timestamp AS updated_at_dwh


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Preserve created_at_dwh on incremental updates.

In the current implementation, created_at_dwh is set to CURRENT_TIMESTAMP on every run. For an incremental model with unique_key, this means created_at_dwh will be overwritten when existing rows are updated (e.g., when the same refund event appears in the incremental window). The audit timestamp should preserve the original creation time.

🔧 Proposed fix to preserve created_at_dwh

SELECT r.refund_event_id , r.source , r.source_refund_id , r.order_id , r.line_item_id , r.qty_refunded , r.refund_amount , r.refunded_at - , CURRENT_TIMESTAMP AS created_at_dwh + {% if is_incremental() %} + , COALESCE(target.created_at_dwh, CURRENT_TIMESTAMP) AS created_at_dwh + {% else %} + , CURRENT_TIMESTAMP AS created_at_dwh + {% endif %} , CURRENT_TIMESTAMP AS updated_at_dwh FROM {{ ref('stg_refunds') }} AS r +{% if is_incremental() %} +LEFT JOIN {{ this }} AS target + ON r.refund_event_id = target.refund_event_id +WHERE r.refunded_at >= {{ get_incremental_value('refunded_at') }} +{% endif %} -{% if is_incremental() %} - WHERE r.refunded_at >= {{ get_incremental_value('refunded_at') }} -{% endif %}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/dw/refund_fact.sql` around lines 18 - 19, The created_at_dwh column is being reset to current_timestamp on every run; change its expression so incremental runs preserve the existing value by using dbt's is_incremental() and selecting the prior created_at_dwh from the target table ({{ this }}) by the model's unique key, falling back to current_timestamp if none exists; keep updated_at_dwh as current_timestamp. Use the created_at_dwh and updated_at_dwh column names and the model's unique key (the same unique_key used for incremental logic) to implement a conditional/subquery lookup that returns the prior created_at_dwh when is_incremental() is true.

coderabbitai · 2026-05-20T22:31:57Z

+    SELECT
+        refund_id AS source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_shopify') }}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

This SELECT clause uses trailing commas. Additionally, ensure SQL type names in CAST expressions are uppercase.

✏️ Proposed fix

SELECT - refund_id AS source_refund_id - , order_id - , line_item_id - , qty_refunded - , amount_in_cents / 100.0 AS refund_amount - , CAST(refunded_at AS timestamp) AS refunded_at - , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute + refund_id AS source_refund_id + , order_id + , line_item_id + , qty_refunded + , amount_in_cents / 100.0 AS refund_amount + , CAST(refunded_at AS TIMESTAMP) AS refunded_at + , DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute FROM {{ ref('base_refunds_shopify') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

refund_id AS source_refund_id

, order_id

, line_item_id

, qty_refunded

, amount_in_cents / 100.0 AS refund_amount

, CAST(refunded_at AS timestamp) AS refunded_at

, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute

FROM {{ ref('base_refunds_shopify') }}

SELECT

refund_id AS source_refund_id

, order_id

, line_item_id

, qty_refunded

, amount_in_cents / 100.0 AS refund_amount

, CAST(refunded_at AS TIMESTAMP) AS refunded_at

, DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute

FROM {{ ref('base_refunds_shopify') }}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/staging/stg_refunds.sql` around lines 16 - 24, The SELECT uses trailing commas and lowercase SQL type names; switch to leading commas for each column in the SELECT list (e.g., for refund_id AS source_refund_id, order_id, line_item_id, qty_refunded, amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS refunded_at, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute) and change the CAST type identifier to uppercase (CAST(refunded_at AS TIMESTAMP)) so both CAST(...) occurrences use TIMESTAMP and all commas are leading per style guidelines.

coderabbitai · 2026-05-20T22:31:57Z

+    SELECT
+        refund_id AS source_refund_id
+        , order_id
+        , tender_type
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(processed_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_stripe') }}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix

SELECT - refund_id AS source_refund_id - , order_id - , tender_type - , amount_in_cents / 100.0 AS refund_amount - , CAST(processed_at AS timestamp) AS refunded_at - , DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute + refund_id AS source_refund_id + , order_id + , tender_type + , amount_in_cents / 100.0 AS refund_amount + , CAST(processed_at AS TIMESTAMP) AS refunded_at + , DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) AS refunded_at_minute FROM {{ ref('base_refunds_stripe') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

refund_id AS source_refund_id

, order_id

, tender_type

, amount_in_cents / 100.0 AS refund_amount

, CAST(processed_at AS timestamp) AS refunded_at

, DATE_TRUNC('minute', CAST(processed_at AS timestamp)) AS refunded_at_minute

FROM {{ ref('base_refunds_stripe') }}

SELECT

refund_id AS source_refund_id

, order_id

, tender_type

, amount_in_cents / 100.0 AS refund_amount

, CAST(processed_at AS TIMESTAMP) AS refunded_at

, DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) AS refunded_at_minute

FROM {{ ref('base_refunds_stripe') }}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/staging/stg_refunds.sql` around lines 28 - 35, Convert the SELECT block to use leading commas and uppercase SQL keywords: change the column list so each column line begins with a leading comma (e.g., ", order_id") and make keywords/types uppercase (SELECT, AS, CAST, DATE_TRUNC, TIMESTAMP) and ensure CAST(processed_at AS TIMESTAMP) and DATE_TRUNC('minute', CAST(processed_at AS TIMESTAMP)) follow that casing; keep the same source reference {{ ref('base_refunds_stripe') }} and column aliases (source_refund_id, refund_amount, refunded_at, refunded_at_minute) unchanged.

coderabbitai · 2026-05-20T22:31:57Z

+        refund_id AS source_refund_id
+        , order_id
+        , amount_in_cents / 100.0 AS refund_amount
+        , CAST(refunded_at AS timestamp) AS refunded_at
+        , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute
+    FROM {{ ref('base_refunds_internal_pos') }}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix

SELECT - refund_id AS source_refund_id - , order_id - , amount_in_cents / 100.0 AS refund_amount - , CAST(refunded_at AS timestamp) AS refunded_at - , DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute + refund_id AS source_refund_id + , order_id + , amount_in_cents / 100.0 AS refund_amount + , CAST(refunded_at AS TIMESTAMP) AS refunded_at + , DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute FROM {{ ref('base_refunds_internal_pos') }}

As per coding guidelines: use leading commas and uppercase keywords in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

refund_id AS source_refund_id

, order_id

, amount_in_cents / 100.0 AS refund_amount

, CAST(refunded_at AS timestamp) AS refunded_at

, DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute

FROM {{ ref('base_refunds_internal_pos') }}

refund_id AS source_refund_id

, order_id

, amount_in_cents / 100.0 AS refund_amount

, CAST(refunded_at AS TIMESTAMP) AS refunded_at

, DATE_TRUNC('minute', CAST(refunded_at AS TIMESTAMP)) AS refunded_at_minute

FROM {{ ref('base_refunds_internal_pos') }}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/staging/stg_refunds.sql` around lines 59 - 64, Convert the SELECT list to use leading commas and make SQL type/keyword names uppercase: move commas to the start of lines before order_id, amount_in_cents / 100.0 AS refund_amount, CAST(refunded_at AS timestamp) AS refunded_at, and DATE_TRUNC('minute', CAST(refunded_at AS timestamp)) AS refunded_at_minute; also change type/keyword casing such as CAST(... AS timestamp) → CAST(... AS TIMESTAMP) and DATE_TRUNC → DATE_TRUNC and ensure AS is uppercase; the clause referencing ref('base_refunds_internal_pos') should remain unchanged.

coderabbitai · 2026-05-20T22:31:57Z

+    SELECT
+        'shopify' AS source
+        , source_refund_id
+        , order_id
+        , line_item_id
+        , qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
+    FROM shopify
+
+    UNION ALL
+
+    SELECT
+        'internal_pos' AS source
+        , source_refund_id
+        , order_id
+        , CAST(NULL AS varchar) AS line_item_id
+        , CAST(NULL AS bigint) AS qty_refunded
+        , refund_amount
+        , refunded_at
+        , refunded_at_minute
+    FROM pos
+
+    UNION ALL
+
+    -- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).
+    SELECT
+        'stripe' AS source
+        , se.source_refund_id
+        , se.order_id
+        , CAST(NULL AS varchar) AS line_item_id
+        , CAST(NULL AS bigint) AS qty_refunded
+        , se.refund_amount
+        , se.refunded_at
+        , se.refunded_at_minute
+    FROM stripe_events AS se
+    LEFT JOIN shopify_keys AS sk
+        ON se.order_id = sk.order_id
+            AND se.refunded_at_minute = sk.refunded_at_minute
+    WHERE sk.order_id IS NULL


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use leading commas and uppercase type names per SQL style guideline.

✏️ Proposed fix

SELECT - 'shopify' AS source - , source_refund_id - , order_id - , line_item_id - , qty_refunded - , refund_amount - , refunded_at - , refunded_at_minute + 'shopify' AS source + , source_refund_id + , order_id + , line_item_id + , qty_refunded + , refund_amount + , refunded_at + , refunded_at_minute FROM shopify UNION ALL SELECT - 'internal_pos' AS source - , source_refund_id - , order_id - , CAST(NULL AS varchar) AS line_item_id - , CAST(NULL AS bigint) AS qty_refunded - , refund_amount - , refunded_at - , refunded_at_minute + 'internal_pos' AS source + , source_refund_id + , order_id + , CAST(NULL AS VARCHAR) AS line_item_id + , CAST(NULL AS BIGINT) AS qty_refunded + , refund_amount + , refunded_at + , refunded_at_minute FROM pos UNION ALL -- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute). SELECT - 'stripe' AS source - , se.source_refund_id - , se.order_id - , CAST(NULL AS varchar) AS line_item_id - , CAST(NULL AS bigint) AS qty_refunded - , se.refund_amount - , se.refunded_at - , se.refunded_at_minute + 'stripe' AS source + , se.source_refund_id + , se.order_id + , CAST(NULL AS VARCHAR) AS line_item_id + , CAST(NULL AS BIGINT) AS qty_refunded + , se.refund_amount + , se.refunded_at + , se.refunded_at_minute FROM stripe_events AS se

As per coding guidelines: use leading commas and uppercase keywords (including type names) in SQL.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SELECT

'shopify' AS source

, source_refund_id

, order_id

, line_item_id

, qty_refunded

, refund_amount

, refunded_at

, refunded_at_minute

FROM shopify

UNION ALL

SELECT

'internal_pos' AS source

, source_refund_id

, order_id

, CAST(NULL AS varchar) AS line_item_id

, CAST(NULL AS bigint) AS qty_refunded

, refund_amount

, refunded_at

, refunded_at_minute

FROM pos

UNION ALL

-- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).

SELECT

'stripe' AS source

, se.source_refund_id

, se.order_id

, CAST(NULL AS varchar) AS line_item_id

, CAST(NULL AS bigint) AS qty_refunded

, se.refund_amount

, se.refunded_at

, se.refunded_at_minute

FROM stripe_events AS se

LEFT JOIN shopify_keys AS sk

ON se.order_id = sk.order_id

AND se.refunded_at_minute = sk.refunded_at_minute

WHERE sk.order_id IS NULL

SELECT

'shopify' AS source

, source_refund_id

, order_id

, line_item_id

, qty_refunded

, refund_amount

, refunded_at

, refunded_at_minute

FROM shopify

UNION ALL

SELECT

'internal_pos' AS source

, source_refund_id

, order_id

, CAST(NULL AS VARCHAR) AS line_item_id

, CAST(NULL AS BIGINT) AS qty_refunded

, refund_amount

, refunded_at

, refunded_at_minute

FROM pos

UNION ALL

-- Stripe-direct: only refunds with no Shopify counterpart at the same (order_id, minute).

SELECT

'stripe' AS source

, se.source_refund_id

, se.order_id

, CAST(NULL AS VARCHAR) AS line_item_id

, CAST(NULL AS BIGINT) AS qty_refunded

, se.refund_amount

, se.refunded_at

, se.refunded_at_minute

FROM stripe_events AS se

LEFT JOIN shopify_keys AS sk

ON se.order_id = sk.order_id

AND se.refunded_at_minute = sk.refunded_at_minute

WHERE sk.order_id IS NULL

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@models/orders/staging/stg_refunds.sql` around lines 68 - 108, The SQL uses trailing commas and lowercase type names; update the SELECT lists in the shopify, pos, and stripe (alias se) blocks to use leading commas before each column and change type names in CASTs to uppercase (e.g., CAST(NULL AS VARCHAR), CAST(NULL AS BIGINT)); ensure the UNION ALL sections and JOIN clause remain functionally identical while applying the leading-comma formatting to every SELECT projection (including the 'source' literal) and capitalizing SQL keywords where needed.

atiwary added 2 commits May 20, 2026 16:48

tiwaryanuj requested a review from phil510 as a code owner May 20, 2026 22:25

coderabbitai Bot requested changes May 20, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DATA-123] Fix order_fact revenue + [DATA-145] add refunds modeling#8

[DATA-123] Fix order_fact revenue + [DATA-145] add refunds modeling#8
tiwaryanuj wants to merge 2 commits into
helloextend:masterfrom
tiwaryanuj:atiwary.DATA-123.fix-order-fact-revenue

tiwaryanuj commented May 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 20, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

coderabbitai Bot May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- Refunds are intentionally not modeled. `raw/` contains `refunds_*.csv` files but they're not wired into any model. If refund logic is needed, it belongs on a separate `refund_fact`, not on `order_fact`.
		- Merchants are current-state only (`lkp_merchants`). No Type-2 history. Tier-at-time-of-order needs a separate snapshot.

		, current_timestamp AS created_at_dwh
		, current_timestamp AS updated_at_dwh

Conversation

tiwaryanuj commented May 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

DATA-123 root cause

DATA-145 (refunds) — what landed

Validation

Test plan

Out of scope (intentionally)

Open items / follow-ups

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tiwaryanuj commented May 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 20, 2026 •

edited

Loading