Skip to content

[HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation#585

Draft
jimdowling wants to merge 3 commits into
logicalclocks:mainfrom
jimdowling:HWORKS-2802
Draft

[HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation#585
jimdowling wants to merge 3 commits into
logicalclocks:mainfrom
jimdowling:HWORKS-2802

Conversation

@jimdowling
Copy link
Copy Markdown
Contributor

@jimdowling jimdowling commented May 21, 2026

Summary

User-guide section documenting the new partitioned_by parameter on feature group creation. Lives under the existing partitioning area in docs/user_guides/fs/feature_group/create.md.

Covers:

  • Usage example with create_feature_group / get_or_create_feature_group.
  • The storage-engine-derived contract: the user's dataframe never carries the grain columns; Delta GENERATED ALWAYS AS handles it server-side.
  • Validation rules (mutual exclusion with partition_key, requires event_time, enum membership).
  • Partition-pruning table — Delta auto-derives partition predicates from the GENERATED expressions for hierarchical specs. fg.read(start_time, end_time) and fg.filter(fg.event_time >= ...) prune at the partition level for hierarchical partitioned_by. Non-hierarchical specs (["month"], ["year","week"]) are valid but skip auto-derivation.
  • Online feature store behavior: derived columns live offline-only by default; online_partition_columns=true opts into online materialization.
  • Hudi: previously rejected pre-HWORKS-2807; post-HWORKS-2807 the same parameter works on Hudi via the server-side PartitionedByTransformer + CustomKeyGenerator.

Pairs with:

JIRA: HWORKS-2802. Engineering walkthrough: Confluence page.

Test plan

  • npx markdownlint-cli2 docs/user_guides/fs/feature_group/create.md clean.
  • uv run mkdocs build -s clean (run after the SDK PR lands, since the API reference plugin pulls from hopsworks-api main).
  • Visual check of the rendered section via mkdocs serve.

🤖 Generated with Claude Code

…tion

https://hopsworks.atlassian.net/browse/HWORKS-2802

Add a section to docs/user_guides/fs/feature_group/create.md
describing the storage-engine-native partitioned_by parameter for
Delta feature groups. Covers:

- Usage example with create_feature_group / get_or_create_feature_group.
- The CREATE TABLE … USING DELTA … GENERATED ALWAYS AS … contract:
  the storage layer derives the partition columns; the user's
  dataframe never carries them.
- Validation rules: mutual exclusion with partition_key, requires
  event_time.
- Partition pruning table — Delta auto-derives partition predicates
  from the GENERATED expressions for hierarchical specs (year /
  year+month / year+month+day / year+month+day+hour), so
  `fg.read(start_time=..., end_time=...)` and
  `fg.filter(fg.event_time >= ...)` prune at the partition level.
  Non-hierarchical specs (e.g. ["month"], ["year","week"]) are valid
  but skip the auto-derivation — only direct predicates on the
  grain columns prune. Recommend hierarchical specs.
- Online feature store behavior: derived columns live offline-only
  by default; online_partition_columns=true opts into online
  materialization. Until the onlinefs consumer filter ships, the
  backend rejects partitioned_by + online_enabled=true with the
  default online_partition_columns=false. Document both
  workarounds.
- Hudi: partitioned_by + HUDI is rejected at creation; Hudi support
  is tracked under a separate follow-up ticket.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jimdowling jimdowling changed the title [HWORKS-2802] Document partitioned_by parameter on feature group creation [HWORKS-2802 / -2807] Document partitioned_by parameter on feature group creation May 21, 2026
jimdowling and others added 2 commits May 30, 2026 11:43
https://hopsworks.atlassian.net/browse/HWORKS-2802

The partitioned_by section described Delta GENERATED ALWAYS AS columns and
storage-engine-side derivation, which is no longer how it works. Document
the real design: the client derives the grain columns from event_time and
writes them as real partition columns, pruning works natively on grain
filters and via predicate translation on event_time ranges. Correct the
online-store note: online-enabled partitioned_by feature groups are
rejected entirely until HWORKS-2808, not only with the default
online_partition_columns.

Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant