Skip to content

fix(order_fact): compute revenue from line items (DATA-123)#5

Open
aliciatoshima wants to merge 3 commits into
masterfrom
fix/DATA-123-order-fact-revenue-from-line-items
Open

fix(order_fact): compute revenue from line items (DATA-123)#5
aliciatoshima wants to merge 3 commits into
masterfrom
fix/DATA-123-order-fact-revenue-from-line-items

Conversation

@aliciatoshima
Copy link
Copy Markdown

@aliciatoshima aliciatoshima commented May 19, 2026

Summary

Fixes the Q1 revenue reconciliation gap reported in DATA-123. order_fact.revenue was under-reporting by ~$2.47M versus the Stripe captures total.

Root cause was two interacting bugs in models/orders/dw/order_fact.sql:

  • INNER JOIN to stg_line_items inside the shipment CTE meant orders that hadn't shipped yet contributed NULL revenue. ~$1.55M lost across 1,374 orders.
  • QUALIFY row_number() … = 1 kept only the first shipment per order, silently dropping revenue from later shipments. ~$0.92M lost.

Per models/orders/dw/orders_dw.md (the order_fact_revenue doc), revenue is sum(quantity * unit_price) across an order's line items, gross of refunds. Restructured the model to compute revenue / line_count / total_quantity from stg_line_items at the order grain, and shipment_count / shipped_at from stg_shipments separately. The QUALIFY dedup is no longer needed.

Also bumps the existing reconciliation test from severity='warn' to error, per CONTRIBUTING.md: "severity: warn is for noisy upstream conditions, not for 'we know this is broken.'"

Reconciliation

Before After
order_fact non-test revenue $10,522,258.04 $12,989,886.01
Stripe captures (target) $12,989,886.01 $12,989,886.01
Gap $2,467,627.97 $0.00

Test plan

  • make full — clean rebuild succeeds (17/17)
  • make test — all 12 tests pass, including the now-error reconciliation test
  • sqlfluff lint models/orders/dw/order_fact.sql — clean
  • Manual: confirmed sum(revenue) where !is_test in order_fact matches sum(quantity * unit_price) from stg_line_items for non-test orders

Follow-ups (not in this PR)

  • daily_revenue.sql (and any other consumers of order_fact) auto-corrects since it selects from order_fact.
  • Incremental watermark still uses updated_at_dwh = current_timestamp, which means the watermark advances by wall-clock rather than by data time. Worth a separate ticket — out of scope here.
  • Pre-existing sqlfluff issues in tests/order_fact_revenue_reconciliation.sql (lowercase keywords) left alone to keep PR scope tight.

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Updated order metrics calculation methodology for revenue, quantity, and line count reporting.
  • Tests

    • Revenue reconciliation test now enforces stricter validation with error-level reporting.

Review Change Stack

`order_fact.revenue` was summing `quantity_shipped * unit_price` over the
first shipment per order, which under-reported Q1 revenue by ~$2.47M vs
Stripe captures. Two interacting bugs:

- INNER JOIN to `stg_line_items` inside the shipments CTE meant orders
  with no shipments yet contributed NULL revenue (~$1.55M lost).
- `QUALIFY row_number() ... = 1` kept only the first shipment per order,
  silently dropping revenue from later shipments (~$0.92M lost).

Per `orders_dw.md` (`order_fact_revenue` doc), revenue is the sum of
`quantity * unit_price` across an order's line items, gross of refunds.
Restructured the model to compute revenue/line_count/total_quantity from
`stg_line_items` at the order grain, and shipment_count/shipped_at from
`stg_shipments` separately. The dedup `QUALIFY` is no longer needed.

Also bumps `tests/order_fact_revenue_reconciliation.sql` from `warn` to
`error` (per CONTRIBUTING.md: `severity: warn` is for noisy upstream
conditions, not for "we know this is broken").

After fix: `order_fact` non-test revenue = \$12,989,886.01, matches the
Stripe captures total exactly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@aliciatoshima aliciatoshima requested a review from phil510 as a code owner May 19, 2026 21:16
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 19, 2026

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c1c034d-d27c-4184-8cbe-6224ea5f1b8f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

This PR refactors the order_fact model to compute order-level metrics directly from staging tables at order grain, replacing a multi-CTE shipment-centric pipeline. Two new aggregations—order_revenue from line items and order_shipments from shipments—feed into the enriched CTE, eliminating per-shipment deduplication logic. The test configuration for revenue reconciliation is simultaneously escalated from warning to error severity to enforce data integrity after the restructuring.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)
  • DATA-123: Request failed with status code 401

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
models/orders/dw/order_fact.sql (1)

71-72: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Incremental filtering still misses late changes on existing orders.

Line 72 only reloads rows whose ordered_at is newer than the last model run. If a shipment is added later, or line items are corrected for an older order, that order will never be recomputed, so shipment_count, shipped_at, and revenue can drift again after the initial load.

This should key incremental processing off changed order_ids from the contributing sources instead of ordered_at. At minimum, this rollout needs a full refresh or targeted backfill so historical rows pick up the new logic.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/order_fact.sql` around lines 71 - 72, The incremental WHERE
currently uses ordered_at (get_incremental_value('updated_at_dwh')) which misses
later changes to existing orders; change the incremental logic in the
is_incremental() branch so rows are selected by changed order_id from upstream
sources (e.g., compare orders.updated_at, shipments.updated_at,
line_items.updated_at) instead of ordered_at — for example, replace the single
ordered_at filter with a predicate that reloads order_id values whose latest
source updated_at >= get_incremental_value('updated_at_dwh') (or perform a
full_refresh/targeted backfill if you cannot derive changed order_ids), so
computed fields like shipment_count, shipped_at, and revenue are recomputed when
any contributing source changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@models/orders/dw/order_fact.sql`:
- Around line 71-72: The incremental WHERE currently uses ordered_at
(get_incremental_value('updated_at_dwh')) which misses later changes to existing
orders; change the incremental logic in the is_incremental() branch so rows are
selected by changed order_id from upstream sources (e.g., compare
orders.updated_at, shipments.updated_at, line_items.updated_at) instead of
ordered_at — for example, replace the single ordered_at filter with a predicate
that reloads order_id values whose latest source updated_at >=
get_incremental_value('updated_at_dwh') (or perform a full_refresh/targeted
backfill if you cannot derive changed order_ids), so computed fields like
shipment_count, shipped_at, and revenue are recomputed when any contributing
source changes.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97bdd27c-eeb8-4369-bf7b-e6bb3dadc206

📥 Commits

Reviewing files that changed from the base of the PR and between 1b51f4e and da4ee7c.

📒 Files selected for processing (2)
  • models/orders/dw/order_fact.sql
  • tests/order_fact_revenue_reconciliation.sql

coderabbitai[bot]
coderabbitai Bot previously approved these changes May 19, 2026
Paste-ready Google Slides outline for the upcoming refunds-into-warehouse
work. Covers the DATA-123 fix recap, the 4 grain traps spotted in raw
refund data, the proposed `refund_fact` / `refund_line_fact` structure
plus denormalized columns on `order_fact` / `order_line_fact`, and the
status of the 5 design decisions (3 locked: store-credit-as-deferred-
liability, pro-rata per-line allocation, order-date attribution; 2 open
pending Jamie: cross-source dedup precedence and Stripe settlement scope).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…der facts

Brings the three raw refund sources (Shopify, Stripe, internal POS) into the
warehouse as two new facts plus denormalized columns on order_fact /
order_line_fact, per the finance readout deck.

New models:
- base_refunds_{shopify,stripe,internal_pos}.sql
- stg_refunds.sql (unioned canonical schema; cents -> dollars)
- refund_fact.sql (one row per CANONICAL refund event, identified by
  (order_id, refunded_at); de-dupes cross-source duplicates and Stripe
  tender-type splits in a single pass)
- refund_line_fact.sql (one row per (event, line); direct line_item_id
  from Shopify when present, pro-rata by line_revenue otherwise)

Updated order_fact / order_line_fact:
- order_fact gains refund_amount, cash_refund_amount, store_credit_issued,
  net_revenue. net_revenue = revenue - cash_refund_amount (store credit is
  a deferred liability per Decision #3, not a revenue reversal).
- order_line_fact gains cash_refund_amount and net_line_revenue.

Design decisions baked into the model (per the deck, finance-approved):
- #1 Cross-source dedup precedence: shopify > stripe > internal_pos
  (Shopify wins because it carries line attribution).
- #2 Stripe settlement scope: stripe_settled_amount = sum of Stripe rows
  with tender_type='card' for the event (matches Stripe settlement reports).
- #3 Store credit treatment: deferred liability. Excluded from
  cash_refund_amount and net_revenue; tracked separately as
  store_credit_issued.
- #4 Per-line allocation when source lacks line_item_id: pro-rata by line
  revenue.
- #5 Date attribution for net revenue by date: bucket by ordered_at (the
  refund attributes back to its order's date for a stable accrual view).

Tests added:
- unique + not_null on refund_event_id (refund_fact PK).
- not_null on refund_event_id, order_id, line_item_id (refund_line_fact).
- refund_line_to_order_reconciliation (severity=error): per-order
  sum(order_line_fact.cash_refund_amount) must equal
  order_fact.cash_refund_amount. Same invariant DATA-123 broke for revenue,
  now guarded for cash refunds.

Verification (non-test orders):
- Revenue:           $12,989,886.01  (DATA-123 still ties to Stripe captures)
- Refund total:          $7,196.93
- Cash refunds:          $5,350.97
- Store credit issued:   $1,845.96
- Net revenue:      $12,984,535.04

All 23 models build clean, all 19 tests pass.

Known follow-ups (out of scope, called out in deck):
- Store-credit redemption modeling once we get a redemption feed (today
  store_credit_issued just accumulates as a liability column).
- Refund-aware incremental strategy for order_fact / order_line_fact (the
  refund_* columns on already-loaded rows go stale between full refreshes;
  refund_fact remains canonical).
- Stripe settlement-report ingestion to automate the recon.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant