Skip to content

docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.4)#172

Merged
dimitri-yatsenko merged 5 commits into
mainfrom
docs/dj-stores-env-vars-2-3
Jun 5, 2026
Merged

docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.4)#172
dimitri-yatsenko merged 5 commits into
mainfrom
docs/dj-stores-env-vars-2-3

Conversation

@dimitri-yatsenko
Copy link
Copy Markdown
Member

@dimitri-yatsenko dimitri-yatsenko commented May 20, 2026

Summary

Documents the env-var configuration of object stores + the public Storage Adapter API contract, coordinated with the datajoint-python 2.2.4 release (just shipped to PyPI).

The env-var work shipped code-side in 2.2.3, but 2.2.4 is what users will install to get the full documented surface — including the dj.StorageAdapter / dj.get_storage_adapter() exports added in datajoint-python#1463. Rebranding the docs wholesale to 2.2.4 keeps the documented and installable surfaces aligned.

Companion code: all on master via 2.2.4 —

New pages

  • src/about/whats-new-2-2-4.md — 2.2.4 release notes: DJ_STORES, DJ_IGNORE_CONFIG_FILE, arbitrary-attr secrets, Storage Adapter API.
  • src/reference/specs/storage-adapter-api.md — Plugin contract for third-party storage protocols. Documents the datajoint.storage entry-point group, the StorageAdapter base class, packaging conventions, and discovery. (Built-in file/s3/gcs/azure protocols continue to be served by the existing internal dispatch in StorageBackend; migrating them onto this contract is tracked separately, per Milagros's item Prepare for deployment #2.)

Updated pages

  • src/reference/configuration.mdDJ_STORES and DJ_IGNORE_CONFIG_FILE rows in the Top-Level Settings table; env-var examples; arbitrary-attr .secrets/ example.
  • src/how-to/manage-secrets.md — Real DJ_STORES JSON pattern replacing the never-implemented DJ_STORES_<NAME>_<ATTR> pattern; Env-var-only deployments subsection; arbitrary-attr secrets note; DJ_TLS=trueDJ_USE_TLS=true (both spots fixed).
  • src/how-to/configure-storage.md — New Configuring stores via environment variables section.
  • src/reference/specs/object-store-configuration.md — Configuration sources & precedence table covering DJ_STORES and DJ_IGNORE_CONFIG_FILE.
  • mkdocs.yaml — Registered whats-new-2-2-4.md and storage-adapter-api.md in the nav.

Review history

Milagros raised seven items in her first-round review — all are addressed:

# Resolution
1 dj.StorageAdapter / dj.get_storage_adapter() exports → shipped in 2.2.4 via datajoint-python#1463
2 "Built-in adapters" section + the false subclass claim removed from spec and what's-new page
3 settings.py "New in 2.3" markers → fixed by datajoint-python#1461 (merged)
4 DJ_TLS=trueDJ_USE_TLS=true at manage-secrets.md:88 fixed
5/6 Real git rebase origin/main — branch base moved from 6f867bf → current main; no inadvertent reverts of #173/#178/#179
7 Thinh's review was auto-dismissed; he's back as a reviewer

Test plan

  • mkdocs build --strict succeeds
  • Internal links resolve (storage-adapter-api.md, whats-new-2-2-4.md, #env-var-only-deployments)
  • Sample env-var commands match datajoint==2.2.4 behavior — verified live: DJ_IGNORE_CONFIG_FILE=true DJ_STORES='{...}' python -c "import datajoint as dj; print(dj.StorageAdapter, dj.config['stores'])" returns the class + the env-var-supplied stores
  • Visual review of whats-new-2-2-4.md rendering

Related

dimitri-yatsenko added a commit to datajoint/datajoint-python that referenced this pull request May 20, 2026
… 2.3

Adds env-var configuration for object stores so the DataJoint platform — and
any env-var-only deployment — can configure plugin-registered storage adapters
(Databricks Unity Catalog Volumes, custom HTTP stores, lab archive systems)
without files on disk.

- DJ_STORES (JSON-encoded) carries the entire `stores` dict in the same shape
  used in `datajoint.json`. Replaces the file's `stores` block when set.
- DJ_IGNORE_CONFIG_FILE (default false) skips `datajoint.json`, the project
  `.secrets/`, and `/run/secrets/datajoint/` entirely. Hard guarantee that no
  file on disk leaks into config.

Implementation notes:

- New `Config.ignore_config_file` field (validation_alias DJ_IGNORE_CONFIG_FILE)
  auto-bound by pydantic-settings.
- The `stores` field receives a `validation_alias` placeholder so
  pydantic-settings does NOT auto-bind DJ_STORES at Config() construction.
  Otherwise its built-in JSON parser intercepts before precedence logic runs
  and reports SettingsError instead of a clean ValueError.
- New `Config._apply_stores_env()` parses DJ_STORES JSON, replaces self.stores
  wholesale, raises ValueError on bad JSON or non-object payloads.
- `_create_config()` restructured to: skip file + secrets when
  ignore_config_file is set; apply DJ_STORES between file load and secrets
  fill so env wins over file and secrets only fill gaps.

Functional precedence (high to low): programmatic > DJ_STORES > config file >
`.secrets/stores.<name>.<attr>` (fills missing attrs only).

Tests: new TestStoreEnv class with 8 tests; existing TestStoreSecrets and
test_storage_adapter tests still pass.

Companion docs: datajoint/datajoint-docs#172
@dimitri-yatsenko dimitri-yatsenko marked this pull request as ready for review May 21, 2026 15:45
@dimitri-yatsenko dimitri-yatsenko force-pushed the docs/dj-stores-env-vars-2-3 branch from 64b3c89 to 6508d8b Compare May 21, 2026 16:56
ttngu207
ttngu207 previously approved these changes May 21, 2026
@dimitri-yatsenko dimitri-yatsenko changed the title docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.3) docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.3) Jun 5, 2026
@dimitri-yatsenko dimitri-yatsenko requested review from MilagrosMarin and removed request for kushalbakshi June 5, 2026 15:12
Copy link
Copy Markdown
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carefully reviewed against datajoint-python@v2.2.3 source. Mechanics check out, structure is excellent, but several spec claims don't match the as-shipped code — flagging in priority order so plugin authors who follow the spec literally don't hit AttributeError.

Verified accurate ✅

  • DJ_STORES impl (settings.py:732-752) — JSON parse, wholesale-replaces self.stores, raises ValueError with the JSON error
  • DJ_IGNORE_CONFIG_FILE impl (settings.py:356-362, 1027)
  • Precedence: source confirms DJ_STORES wholesale-replaces at line 751, .secrets/ fills missing only at 727-728 — matches the precedence table exactly
  • .secrets/stores.<name>.<attr> arbitrary-attr support (settings.py:709-729)
  • StorageAdapter base class exists (storage_adapter.py:23); _COMMON_STORE_KEYS matches the spec's common-keys list exactly
  • Error message strings match (<protocol> store is missing: <fields>, Invalid key(s) for <protocol>: <fields>)
  • /run/secrets/datajoint/ is real (settings.py:57, 130)
  • datajoint-python#1452 is MERGED into v2.2.3 ✓

Substantive issues

1. dj.StorageAdapter and dj.get_storage_adapter() are NOT exported in v2.2.3

Spec uses them throughout:

  • Line 154: class DatabricksVolumesAdapter(dj.StorageAdapter):
  • Line 246: adapter = dj.get_storage_adapter("s3")

grep "StorageAdapter\|get_storage_adapter" src/datajoint/__init__.py returns zero matches. Plugin authors following the spec literally will hit AttributeError: module 'datajoint' has no attribute 'StorageAdapter'. Either:

  • Source needs from .storage_adapter import StorageAdapter, get_storage_adapter added to __init__.py (paired dj-python PR), or
  • Spec needs to use from datajoint.storage_adapter import StorageAdapter

2. Built-in adapter classes don't exist

The "Built-in adapters" table (line 205-210) lists FileAdapter, S3Adapter, GCSAdapter, AzureAdapter. grep -rn "class FileAdapter\|class S3Adapter\|class GCSAdapter\|class AzureAdapter" src/datajoint/ returns zero matches.

Built-in protocols are hardcoded in StorageBackend._create_filesystem() (storage.py:342+) with if self.protocol == "file": ... elif "s3": ... elif "gcs": .... Also no [project.entry-points."datajoint.storage"] declaration in pyproject.toml for built-ins.

Spec line 117 also claims:

DataJoint's built-in file, s3, gcs, and azure protocols are themselves StorageAdapter subclasses.

That's false in v2.2.3. Plugin authors who try to read built-in adapter source as a reference will find nothing. Either the source needs to add the four built-in adapters and register them (paired PR), or the spec needs an "Implementation status" admonition like #177 added — signalling which pieces are forward-looking vs as-shipped.

3. Source docstrings still say "New in 2.3", docs say "New in 2.2.3"

The PR explicitly reframes from 2.3 → 2.2.3, but v2.2.3 source still says "New in 2.3" in:

  • settings.py:341 — stores field
  • settings.py:348-349DJ_STORES env-var support
  • settings.py:361ignore_config_file
  • settings.py:740_apply_stores_env docstring
  • Commit 0b1aca6f title: feat(config): DJ_STORES env var + DJ_IGNORE_CONFIG_FILE flag — new in 2.3
  • PR #1452 title says "2.3"

When a user reads help(dj.config) they'll see "New in 2.3" but the release-notes page says "New in 2.2.3". Needs a paired source-side cleanup PR.

4. DJ_TLS=true typo not fully corrected

PR summary says "corrected DJ_TLSDJ_USE_TLS". Source uses DJ_USE_TLS (settings.py:206). But manage-secrets.md:88 still has:

export DJ_TLS=true

Line 88 was missed — only the row in the bottom env-var table (line 203) got fixed.

Process / scope issues

5. Conflicts with PR #173 (PostgreSQL coverage on installation page)

This PR's installation.md diff is identical to #173's, PLUS adds the inlined PostgreSQL GRANT example (fixing my Issue #2 from #173), and configure-database.md:132 changes "alternative" → "peer database backend" (fixing my Issue #1 from #173).

Two paths forward:

  • Close #173 in favor of #172 (this one supersedes), or
  • Strip the installation / configure-database changes out of #172 so the env-var work is reviewable in isolation, and let #173 land first

6. Scope creep

Beyond the stated scope, this PR also touches:

  • installation.md — PostgreSQL coverage (duplicates #173)
  • configure-database.md — "alternative" → "peer" framing
  • manage-pipeline-project.md/sign-up/contact URL changes (×2)
  • data-pipelines.md — architecture-diagram reframe (open-source core vs platform), PNG → SVG swap
  • New images/dj-platform.svg (binary)

None mentioned in the PR's "Updated pages" list. Several feel like separate concerns that warrant separate PRs for clean reviewability.

7. Thinh's review is DISMISSED

Almost certainly auto-dismissed by the rebase/force-push (827dd8d docs: rebase env-var PR onto current main). Worth re-requesting his review since he's already in the loop on this area.

Positive observations

  • whats-new-2-2-3.md gives users the migration story they need; admonition style + precedence table read well
  • mkdocs.yaml nav placement is sensible (What's New group + Reference → Specifications)
  • Storage Adapter spec structure (overview → when-to → base class → packaging → discovery → built-ins → credentials → errors → API ref) is exactly the right shape — the gaps are in factual accuracy, not in design
  • Precedence model in source matches the docs precisely

Issues #1 and #2 are the most concrete: they're the ones a third-party plugin author trying to write a dj-databricks-storage package today would hit immediately. #3 and #4 are minor textual fixes. #5#7 are merge-coordination questions. Happy to approve once #1, #2, #3, #4 are sorted (or once the spec is reframed with an "Implementation status" banner the way #177 was).

…LE) + Storage Adapter API spec — new in 2.3

Companion to datajoint/datajoint-python#1452.

- src/about/whats-new-23.md (new) — 2.3 release notes
- src/reference/specs/storage-adapter-api.md (new) — public spec for the
  datajoint.storage entry-point group used by built-in adapters and plugins
- src/reference/configuration.md — DJ_STORES, DJ_IGNORE_CONFIG_FILE in env-var
  tables and Top-Level Settings; arbitrary-attr secrets example
- src/how-to/manage-secrets.md — replace the never-implemented
  DJ_STORES_<NAME>_<ATTR> pattern with the real DJ_STORES JSON env var; add
  Env-var-only deployments section; fix DJ_TLS → DJ_USE_TLS
- src/how-to/configure-storage.md — Configuring stores via environment variables
- src/reference/specs/object-store-configuration.md — precedence table and
  DJ_STORES override semantics
- mkdocs.yaml — register whats-new-23.md and storage-adapter-api.md
The DJ_STORES + DJ_IGNORE_CONFIG_FILE feature shipped in datajoint-python
2.2.3 (released 2026-06-05), not 2.3. Updates:

- Rename src/about/whats-new-23.md → src/about/whats-new-2-2-3.md.
- Refresh "What's New" page title, intro prose, version-added admonitions,
  and the GitHub release-notes link to point at the v2.2.3 tag.
- "Upgrading from 2.2?" callout broadened to "Upgrading from 2.2.0–2.2.2?".
- Version-added admonitions and "(new in 2.3)" inline tags across
  configuration.md, manage-secrets.md, configure-storage.md,
  object-store-configuration.md, storage-adapter-api.md → "2.2.3".
- storage-adapter-api.md example: datajoint>=2.3 → datajoint>=2.2.3.
- mkdocs.yaml nav: "What's New in 2.3" → "What's New in 2.2.3" with the
  renamed file path.
Fixes for items #2 and #4 from her review:

- storage-adapter-api.md: removed the "Built-in adapters" section and the
  claim that file/s3/gcs/azure protocols are themselves StorageAdapter
  subclasses. In v2.2.3 those protocols are still served by the existing
  StorageBackend._create_filesystem() dispatch; migrating them onto the
  public adapter contract is tracked separately.
- storage-adapter-api.md: added an "Available in datajoint ≥ 2.2.4" note
  near the StorageAdapter base-class introduction. dj.StorageAdapter /
  dj.get_storage_adapter() are not exported in 2.2.3; a paired
  datajoint-python PR adds the exports for 2.2.4.
- whats-new-2-2-3.md: reframed the "Storage-adapter plugin contract"
  section to match — the contract is for third-party adapters; built-ins
  continue via internal dispatch.
- manage-secrets.md: corrected the lone remaining DJ_TLS=true → DJ_USE_TLS=true
  in the env-var example (line 88).

Items #3 (settings.py "New in 2.3" markers) and #1 (StorageAdapter export)
are addressed by separate datajoint-python PRs.
@dimitri-yatsenko
Copy link
Copy Markdown
Member Author

Thanks for the thorough review, @MilagrosMarin — every item landed. Here's the response by number:

#1dj.StorageAdapter / dj.get_storage_adapter() not exported. Opened paired PR datajoint-python#1463 to add both to __init__.py and __all__. Smoke-tested locally; will ship in 2.2.4. Added a small admonition to storage-adapter-api.md so users on 2.2.3 know to import via from datajoint.storage_adapter import StorageAdapter in the interim.

#2 — Built-in adapter classes don't exist. Removed the "Built-in adapters" table from storage-adapter-api.md entirely. Reframed the whats-new-2-2-3.md "Storage-adapter plugin contract" section: contract is for third-party adapters; built-ins continue via the existing StorageBackend._create_filesystem() dispatch, with migration tracked separately. Adjusted the top-of-spec admonition to match.

#3settings.py still says "New in 2.3" in 4 spots. Already fixed by datajoint-python#1461 (opened before your review, you're on it as reviewer).

#4DJ_TLS=true typo at manage-secrets.md:88. Fixed in the latest commit.

#5 / #6 — Scope creep on installation.md / configure-database.md / manage-pipeline-project.md / data-pipelines.md + the dj-platform PNG/SVG swap. You were right — my earlier "cleanup" commit was net-zero but didn't actually move the branch's base. Force-pushed a proper git rebase origin/main (base moved from 6f867bf82cbf9f). The PR diff is now exactly 7 files: mkdocs.yaml + the six DJ_STORES content pages. None of #173/#178/#179 are touched.

#7 — Thinh's review was dismissed by the force-push. Re-requested via gh pr edit --add-reviewer ttngu207.

Latest commit: cbf1c33 docs: address Milagros's review on PR #172. Ready for another pass when you have a minute.

datajoint-python 2.2.4 shipped to PyPI on 2026-06-05, bundling the
DJ_STORES / DJ_IGNORE_CONFIG_FILE / arbitrary-attr-secrets work
from 2.2.3 with the missing public surface: dj.StorageAdapter and
dj.get_storage_adapter() exports (#1463), the packaging dep fix (#1462),
and the settings.py docstring refresh (#1461).

These docs are coordinated with 2.2.4 — that's where the documented
Storage Adapter API surface (`dj.StorageAdapter`) actually works as the
spec describes. Rebranding the whole page to 2.2.4 keeps the docs in
sync with the version users will install.

Changes:
- Rename src/about/whats-new-2-2-3.md → whats-new-2-2-4.md and bump all
  version mentions, the page title, the "Upgrading from" range, and the
  GitHub release-notes link to v2.2.4.
- All `*New in 2.2.3*` / `(new in 2.2.3)` markers across configuration.md,
  manage-secrets.md, configure-storage.md, object-store-configuration.md,
  and storage-adapter-api.md → 2.2.4.
- Drop the "Available in datajoint ≥ 2.2.4" interim admonition from the
  Storage Adapter API spec — no longer needed now that 2.2.4 is the
  documented floor.
- mkdocs.yaml nav: "What's New in 2.2.3" → "What's New in 2.2.4" with the
  renamed file path.
@dimitri-yatsenko dimitri-yatsenko changed the title docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.3) docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.4) Jun 5, 2026
@dimitri-yatsenko
Copy link
Copy Markdown
Member Author

Update: datajoint-python 2.2.4 just shipped to PyPI (https://github.com/datajoint/datajoint-python/releases/tag/v2.2.4) with all of #1461 / #1462 / #1463 included — which means dj.StorageAdapter / dj.get_storage_adapter() are now actually exported at top level and the spec's examples work as-shown.

Latest commit d185cba rebrands this PR wholesale from 2.2.3 → 2.2.4:

  • whats-new-2-2-3.mdwhats-new-2-2-4.md; page title, intro prose, "Upgrading from 2.2.0–2.2.3?" callout, and release-notes link all bumped
  • Every *New in 2.2.3* / (new in 2.2.3) marker across the env-var docs and the spec → 2.2.4
  • Dropped the "Available in datajoint ≥ 2.2.4" interim admonition (no longer needed — 2.2.4 is the documented floor)
  • mkdocs.yaml nav entry: "What's New in 2.2.4: about/whats-new-2-2-4.md"

@MilagrosMarin / @ttngu207 / @mweitzel — ready for a final pass when one of you has a minute. Plan is to merge as soon as the rebrand looks good so the docs go live aligned with the 2.2.4 PyPI release.

Copy link
Copy Markdown
Collaborator

@MilagrosMarin MilagrosMarin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dimitri-yatsenko — verified the latest state against datajoint-python@v2.2.4 source. All seven items from my previous review are addressed:

#1 dj.StorageAdapter / dj.get_storage_adapter() now exported in v2.2.4 (__init__.py:55-56, 88) via paired #1463.
#2 Built-in adapters table removed from the spec; top admonition correctly notes built-ins continue via StorageBackend._create_filesystem().
#3 v2.2.4 settings.py says "New in 2.2.3" (not "New in 2.3") at lines 341, 348-349, 361, 740 — fixed via paired #1461.
#4 manage-secrets.md:88 now DJ_USE_TLS=true.
#5/#6 Proper rebase done — PR diff is now exactly 7 files (mkdocs.yaml + 6 content pages), zero scope creep.
#7 Thinh re-added as reviewer.

Also confirmed v2.2.4 shipped to PyPI on 2026-06-05 with all three paired PRs (#1461, #1462, #1463) included, and the spec's dj.StorageAdapter import examples now work as documented.

One small nuance, optional polish — not a blocker:

The d185cba rebrand pivots everything 2.2.3 → 2.2.4. The rationale (documented surface only fully works in 2.2.4 once dj.StorageAdapter is exported) is sound. But whats-new-2-2-4.md line 3:

DataJoint 2.2.4 introduces env-var-only configuration of storage...

is strictly off by one micro-version — those features shipped in 2.2.3 (settings.py confirms "New in 2.2.3"). A user running help(dj.config) on v2.2.4 will see "New in 2.2.3" while the release notes page says "New in 2.2.4". The "Upgrading from 2.2.0–2.2.3?" callout softens it, but if you want zero ambiguity:

  • Soften "introduces" → "finalizes the public surface for" env-var configuration, or
  • Headline 2.2.3 with a small "2.2.4 adds: dj.StorageAdapter exports" subsection

Either way is fine. Approving — substantive concerns all resolved, docs are ready to ship with v2.2.4 on PyPI.

@dimitri-yatsenko dimitri-yatsenko merged commit eb4e251 into main Jun 5, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants