Skip to content

perf(nft_transfers): convert cross-chain model to view, optimize chain-level merges#9536

Open
jeff-dude wants to merge 5 commits intomainfrom
cursor/nft-transfers-optimization-ea05
Open

perf(nft_transfers): convert cross-chain model to view, optimize chain-level merges#9536
jeff-dude wants to merge 5 commits intomainfrom
cursor/nft-transfers-optimization-ea05

Conversation

@jeff-dude
Copy link
Copy Markdown
Member

Thank you for contributing to Spellbook 🪄

Description:

Runtime optimization for nft.transfers to address repeated worker node crashes during hourly runs (see: Slack #team-curated-internal).

Root cause: The cross-chain nft_transfers model was materialized as incremental with merge strategy, unioning 21 chains into a single delta table. This is a massive merge operation that overwhelms worker node memory.

Changes:

  1. Convert nft_transfers cross-chain union from incremental to view

    • Per sector spell architecture, cross-chain unions should be views — heavy lifting is done at chain level
    • Eliminates the giant 21-chain merge into one delta table entirely
    • Removes the unique_key, partition_by, incremental_strategy, incremental_predicates, and file_format config (not needed for views)
  2. Add block_month to all 21 chain-level unique_key configs

    • Tables are partitioned by block_month but it was missing from unique_key
    • Without it, the Trino merge must scan all partitions during incremental runs
    • With it, merge only touches partitions that contain new/changed data — major I/O reduction
  3. Fix nft_arbitrum_transfers hardcoded incremental predicate

    • Was using 'DBT_INTERNAL_DEST.block_time >= date_trunc(\'day\', now() - interval \'7\' day)' instead of the incremental_predicate() macro
    • Now uses the standard macro for consistent behavior across environments
  4. Clean up nft_transfers() macro

    • Remove unnecessary SELECT * FROM(...) wrapper — saves an extra projection node in the query plan
    • Remove unnecessary DISTINCT from batch unnest subquery — CROSS JOIN unnest(zip(...)) already produces unique rows per source event
    • Fix Jinja whitespace control for clean compiled SQL output

Compiled & validated — all models compile successfully (dbt compile).


quick links for more information:

Slack Thread

Open in Web Open in Cursor 

…n-level merges

Major changes to address worker node crashes during nft.transfers hourly runs:

1. Convert nft_transfers cross-chain union from incremental merge to view
   - Follows sector spell architecture: cross-chain unions should be views
   - Eliminates the massive 21-chain merge into a single delta table
   - Heavy lifting already done at chain level; view just unions them

2. Add block_month to chain-level unique_key for merge partition pruning
   - Tables are partitioned by block_month but it was missing from unique_key
   - Without it, Trino must scan all partitions during merge
   - With it, merge only touches partitions containing new data

3. Fix arbitrum hardcoded 7-day incremental predicate
   - Was using raw SQL predicate instead of incremental_predicate() macro
   - Now uses the standard macro for consistent behavior across environments

4. Clean up nft_transfers() macro
   - Remove unnecessary SELECT * FROM(...) wrapper (extra projection node)
   - Remove unnecessary DISTINCT from batch unnest subquery
   - Fix Jinja whitespace control for clean compiled SQL

Co-authored-by: jeff-dude <jeff-dude@users.noreply.github.com>
@github-actions github-actions bot added WIP work in progress dbt: hourly covers the hourly dbt subproject labels Apr 6, 2026
…TINCT

Addresses review feedback on the nft.transfers optimization PR:

1. Re-add cross-chain unique_combination_of_columns test including blockchain
   - Guards against duplicate emission across the 21-chain view
   - Uses 7-day window to keep the test cheap on a view
   - combination: (blockchain, block_month, tx_hash, evt_index, token_id, amount)

2. Restore DISTINCT on the ERC1155 batch unnest subquery
   - Protects --full-refresh path from pathological TransferBatch events
     that duplicate (id, value) positions within a single event
   - Incremental merge unique_key already dedupes, so steady-state unchanged

Made-with: Cursor
Temporary CI-only change: replace `is_incremental()` with
`is_incremental() or true` in the nft_transfers() macro so that
the incremental_predicate time filter is applied on every build,
including the initial CI full-refresh.

This keeps the CI build window short (~3 days per the macro default)
across all 21 chain-level transfers tables, instead of scanning full
history on first build.

REVERT BEFORE MERGE.

Bump ci-stamp on cross-chain nft_transfers to 2.

Made-with: Cursor
@jeff-dude jeff-dude marked this pull request as ready for review April 19, 2026 14:09
@github-actions github-actions bot added ready-for-review this PR development is complete, please review and removed WIP work in progress labels Apr 19, 2026
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix prepared a fix for the issue found in the latest run.

  • ✅ Fixed: Debugging or true forces incremental predicate on full refreshes
    • Removed the or true debugging artifact from all six {% if is_incremental() or true %} guards so the incremental predicate is again skipped during full refreshes and initial builds.

Create PR

Or push these changes by commenting:

@cursor push 37df1bdf24
Preview (37df1bdf24)
diff --git a/dbt_subprojects/hourly_spellbook/macros/_sector/nft/nft_transfers.sql b/dbt_subprojects/hourly_spellbook/macros/_sector/nft/nft_transfers.sql
--- a/dbt_subprojects/hourly_spellbook/macros/_sector/nft/nft_transfers.sql
+++ b/dbt_subprojects/hourly_spellbook/macros/_sector/nft/nft_transfers.sql
@@ -27,11 +27,11 @@
 {% if denormalized == False -%}
 INNER JOIN {{ base_transactions }} et ON et.block_number = t.evt_block_number
     AND et.hash = t.evt_tx_hash
-    {% if is_incremental() or true -%}
+    {% if is_incremental() -%}
     AND {{incremental_predicate('et.block_time')}}
     {% endif -%}
 {%- endif -%}
-{% if is_incremental() or true -%}
+{% if is_incremental() -%}
 WHERE {{incremental_predicate('t.evt_block_time')}}
 {% endif -%}
 
@@ -61,11 +61,11 @@
 {% if denormalized == False -%}
 INNER JOIN {{ base_transactions }} et ON et.block_number = t.evt_block_number
     AND et.hash = t.evt_tx_hash
-    {% if is_incremental() or true -%}
+    {% if is_incremental() -%}
     AND {{incremental_predicate('et.block_time')}}
     {% endif -%}
 {%- endif -%}
-{% if is_incremental() or true -%}
+{% if is_incremental() -%}
 WHERE {{incremental_predicate('t.evt_block_time')}}
 {% endif -%}
 
@@ -96,14 +96,14 @@
     , value, id
     FROM {{ erc1155_batch }} t
     CROSS JOIN unnest(zip(t."values", t.ids)) AS foo(value, id)
-    {% if is_incremental() or true -%}
+    {% if is_incremental() -%}
     WHERE {{incremental_predicate('t.evt_block_time')}}
     {% endif -%}
     ) t
 {%- if denormalized == False %}
 INNER JOIN {{ base_transactions }} et ON et.block_number = t.evt_block_number
     AND et.hash = t.evt_tx_hash
-    {% if is_incremental() or true -%}
+    {% if is_incremental() -%}
     AND {{incremental_predicate('et.block_time')}}
     {% endif -%}
 {%- endif %}

You can send follow-ups to the cloud agent here.

Comment @cursor review or bugbot run to trigger another review on this PR

Reviewed by Cursor Bugbot for commit 1b3ec08. Configure here.

Reverts the `or true` clauses added in 1b3ec08 now that CI has
validated the lineage. Macro returns to standard
`{% if is_incremental() -%}` so full-refresh runs scan full history
as intended and incremental runs apply the predicate.

Bump ci-stamp to 3 to ensure CI rebuilds against the reverted body.

Made-with: Cursor
@jeff-dude jeff-dude requested a review from a team April 19, 2026 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dbt: hourly covers the hourly dbt subproject ready-for-review this PR development is complete, please review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants