fix(order_fact): compute revenue from line items (DATA-123) by aliciatoshima · Pull Request #5 · helloextend/poc-data-engineering-coding-challenge

aliciatoshima · 2026-05-19T21:16:47Z

Summary

Fixes the Q1 revenue reconciliation gap reported in DATA-123. order_fact.revenue was under-reporting by ~$2.47M versus the Stripe captures total.

Root cause was two interacting bugs in models/orders/dw/order_fact.sql:

INNER JOIN to stg_line_items inside the shipment CTE meant orders that hadn't shipped yet contributed NULL revenue. ~$1.55M lost across 1,374 orders.
QUALIFY row_number() … = 1 kept only the first shipment per order, silently dropping revenue from later shipments. ~$0.92M lost.

Per models/orders/dw/orders_dw.md (the order_fact_revenue doc), revenue is sum(quantity * unit_price) across an order's line items, gross of refunds. Restructured the model to compute revenue / line_count / total_quantity from stg_line_items at the order grain, and shipment_count / shipped_at from stg_shipments separately. The QUALIFY dedup is no longer needed.

Also bumps the existing reconciliation test from severity='warn' to error, per CONTRIBUTING.md: "severity: warn is for noisy upstream conditions, not for 'we know this is broken.'"

Reconciliation

	Before	After
`order_fact` non-test revenue	$10,522,258.04	$12,989,886.01
Stripe captures (target)	$12,989,886.01	$12,989,886.01
Gap	$2,467,627.97	$0.00

Test plan

make full — clean rebuild succeeds (17/17)
make test — all 12 tests pass, including the now-error reconciliation test
sqlfluff lint models/orders/dw/order_fact.sql — clean
Manual: confirmed sum(revenue) where !is_test in order_fact matches sum(quantity * unit_price) from stg_line_items for non-test orders

Follow-ups (not in this PR)

daily_revenue.sql (and any other consumers of order_fact) auto-corrects since it selects from order_fact.
Incremental watermark still uses updated_at_dwh = current_timestamp, which means the watermark advances by wall-clock rather than by data time. Worth a separate ticket — out of scope here.
Pre-existing sqlfluff issues in tests/order_fact_revenue_reconciliation.sql (lowercase keywords) left alone to keep PR scope tight.

🤖 Generated with Claude Code

Summary by CodeRabbit

Bug Fixes
- Updated order metrics calculation methodology for revenue, quantity, and line count reporting.
Tests
- Revenue reconciliation test now enforces stricter validation with error-level reporting.

`order_fact.revenue` was summing `quantity_shipped * unit_price` over the first shipment per order, which under-reported Q1 revenue by ~$2.47M vs Stripe captures. Two interacting bugs: - INNER JOIN to `stg_line_items` inside the shipments CTE meant orders with no shipments yet contributed NULL revenue (~$1.55M lost). - `QUALIFY row_number() ... = 1` kept only the first shipment per order, silently dropping revenue from later shipments (~$0.92M lost). Per `orders_dw.md` (`order_fact_revenue` doc), revenue is the sum of `quantity * unit_price` across an order's line items, gross of refunds. Restructured the model to compute revenue/line_count/total_quantity from `stg_line_items` at the order grain, and shipment_count/shipped_at from `stg_shipments` separately. The dedup `QUALIFY` is no longer needed. Also bumps `tests/order_fact_revenue_reconciliation.sql` from `warn` to `error` (per CONTRIBUTING.md: `severity: warn` is for noisy upstream conditions, not for "we know this is broken"). After fix: `order_fact` non-test revenue = \$12,989,886.01, matches the Stripe captures total exactly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-19T21:17:02Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4c1c034d-d27c-4184-8cbe-6224ea5f1b8f

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR refactors the order_fact model to compute order-level metrics directly from staging tables at order grain, replacing a multi-CTE shipment-centric pipeline. Two new aggregations—order_revenue from line items and order_shipments from shipments—feed into the enriched CTE, eliminating per-shipment deduplication logic. The test configuration for revenue reconciliation is simultaneously escalated from warning to error severity to enforce data integrity after the restructuring.

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)

DATA-123: Request failed with status code 401

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

models/orders/dw/order_fact.sql (1)
71-72: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Incremental filtering still misses late changes on existing orders.

Line 72 only reloads rows whose ordered_at is newer than the last model run. If a shipment is added later, or line items are corrected for an older order, that order will never be recomputed, so shipment_count, shipped_at, and revenue can drift again after the initial load.

This should key incremental processing off changed order_ids from the contributing sources instead of ordered_at. At minimum, this rollout needs a full refresh or targeted backfill so historical rows pick up the new logic.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/orders/dw/order_fact.sql` around lines 71 - 72, The incremental WHERE
currently uses ordered_at (get_incremental_value('updated_at_dwh')) which misses
later changes to existing orders; change the incremental logic in the
is_incremental() branch so rows are selected by changed order_id from upstream
sources (e.g., compare orders.updated_at, shipments.updated_at,
line_items.updated_at) instead of ordered_at — for example, replace the single
ordered_at filter with a predicate that reloads order_id values whose latest
source updated_at >= get_incremental_value('updated_at_dwh') (or perform a
full_refresh/targeted backfill if you cannot derive changed order_ids), so
computed fields like shipment_count, shipped_at, and revenue are recomputed when
any contributing source changes.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@models/orders/dw/order_fact.sql`:
- Around line 71-72: The incremental WHERE currently uses ordered_at
(get_incremental_value('updated_at_dwh')) which misses later changes to existing
orders; change the incremental logic in the is_incremental() branch so rows are
selected by changed order_id from upstream sources (e.g., compare
orders.updated_at, shipments.updated_at, line_items.updated_at) instead of
ordered_at — for example, replace the single ordered_at filter with a predicate
that reloads order_id values whose latest source updated_at >=
get_incremental_value('updated_at_dwh') (or perform a full_refresh/targeted
backfill if you cannot derive changed order_ids), so computed fields like
shipment_count, shipped_at, and revenue are recomputed when any contributing
source changes.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 97bdd27c-eeb8-4369-bf7b-e6bb3dadc206

📥 Commits

Reviewing files that changed from the base of the PR and between 1b51f4e and da4ee7c.

📒 Files selected for processing (2)

models/orders/dw/order_fact.sql
tests/order_fact_revenue_reconciliation.sql

Paste-ready Google Slides outline for the upcoming refunds-into-warehouse work. Covers the DATA-123 fix recap, the 4 grain traps spotted in raw refund data, the proposed `refund_fact` / `refund_line_fact` structure plus denormalized columns on `order_fact` / `order_line_fact`, and the status of the 5 design decisions (3 locked: store-credit-as-deferred- liability, pro-rata per-line allocation, order-date attribution; 2 open pending Jamie: cross-source dedup precedence and Stripe settlement scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…der facts Brings the three raw refund sources (Shopify, Stripe, internal POS) into the warehouse as two new facts plus denormalized columns on order_fact / order_line_fact, per the finance readout deck. New models: - base_refunds_{shopify,stripe,internal_pos}.sql - stg_refunds.sql (unioned canonical schema; cents -> dollars) - refund_fact.sql (one row per CANONICAL refund event, identified by (order_id, refunded_at); de-dupes cross-source duplicates and Stripe tender-type splits in a single pass) - refund_line_fact.sql (one row per (event, line); direct line_item_id from Shopify when present, pro-rata by line_revenue otherwise) Updated order_fact / order_line_fact: - order_fact gains refund_amount, cash_refund_amount, store_credit_issued, net_revenue. net_revenue = revenue - cash_refund_amount (store credit is a deferred liability per Decision #3, not a revenue reversal). - order_line_fact gains cash_refund_amount and net_line_revenue. Design decisions baked into the model (per the deck, finance-approved): - #1 Cross-source dedup precedence: shopify > stripe > internal_pos (Shopify wins because it carries line attribution). - #2 Stripe settlement scope: stripe_settled_amount = sum of Stripe rows with tender_type='card' for the event (matches Stripe settlement reports). - #3 Store credit treatment: deferred liability. Excluded from cash_refund_amount and net_revenue; tracked separately as store_credit_issued. - #4 Per-line allocation when source lacks line_item_id: pro-rata by line revenue. - #5 Date attribution for net revenue by date: bucket by ordered_at (the refund attributes back to its order's date for a stable accrual view). Tests added: - unique + not_null on refund_event_id (refund_fact PK). - not_null on refund_event_id, order_id, line_item_id (refund_line_fact). - refund_line_to_order_reconciliation (severity=error): per-order sum(order_line_fact.cash_refund_amount) must equal order_fact.cash_refund_amount. Same invariant DATA-123 broke for revenue, now guarded for cash refunds. Verification (non-test orders): - Revenue: $12,989,886.01 (DATA-123 still ties to Stripe captures) - Refund total: $7,196.93 - Cash refunds: $5,350.97 - Store credit issued: $1,845.96 - Net revenue: $12,984,535.04 All 23 models build clean, all 19 tests pass. Known follow-ups (out of scope, called out in deck): - Store-credit redemption modeling once we get a redemption feed (today store_credit_issued just accumulates as a liability column). - Refund-aware incremental strategy for order_fact / order_line_fact (the refund_* columns on already-loaded rows go stale between full refreshes; refund_fact remains canonical). - Stripe settlement-report ingestion to automate the recon. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aliciatoshima requested a review from phil510 as a code owner May 19, 2026 21:16

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

coderabbitai Bot previously approved these changes May 19, 2026

View reviewed changes

aliciatoshima dismissed coderabbitai[bot]’s stale review via efe6689 May 19, 2026 21:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(order_fact): compute revenue from line items (DATA-123)#5

fix(order_fact): compute revenue from line items (DATA-123)#5
aliciatoshima wants to merge 3 commits into
masterfrom
fix/DATA-123-order-fact-revenue-from-line-items

aliciatoshima commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Review skipped

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aliciatoshima commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reconciliation

Test plan

Follow-ups (not in this PR)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Review ran into problems

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aliciatoshima commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading