docs: env-var configuration of stores (DJ_STORES, DJ_IGNORE_CONFIG_FILE) + Storage Adapter API spec (2.2.4)#172
Conversation
… 2.3 Adds env-var configuration for object stores so the DataJoint platform — and any env-var-only deployment — can configure plugin-registered storage adapters (Databricks Unity Catalog Volumes, custom HTTP stores, lab archive systems) without files on disk. - DJ_STORES (JSON-encoded) carries the entire `stores` dict in the same shape used in `datajoint.json`. Replaces the file's `stores` block when set. - DJ_IGNORE_CONFIG_FILE (default false) skips `datajoint.json`, the project `.secrets/`, and `/run/secrets/datajoint/` entirely. Hard guarantee that no file on disk leaks into config. Implementation notes: - New `Config.ignore_config_file` field (validation_alias DJ_IGNORE_CONFIG_FILE) auto-bound by pydantic-settings. - The `stores` field receives a `validation_alias` placeholder so pydantic-settings does NOT auto-bind DJ_STORES at Config() construction. Otherwise its built-in JSON parser intercepts before precedence logic runs and reports SettingsError instead of a clean ValueError. - New `Config._apply_stores_env()` parses DJ_STORES JSON, replaces self.stores wholesale, raises ValueError on bad JSON or non-object payloads. - `_create_config()` restructured to: skip file + secrets when ignore_config_file is set; apply DJ_STORES between file load and secrets fill so env wins over file and secrets only fill gaps. Functional precedence (high to low): programmatic > DJ_STORES > config file > `.secrets/stores.<name>.<attr>` (fills missing attrs only). Tests: new TestStoreEnv class with 8 tests; existing TestStoreSecrets and test_storage_adapter tests still pass. Companion docs: datajoint/datajoint-docs#172
64b3c89 to
6508d8b
Compare
MilagrosMarin
left a comment
There was a problem hiding this comment.
Carefully reviewed against datajoint-python@v2.2.3 source. Mechanics check out, structure is excellent, but several spec claims don't match the as-shipped code — flagging in priority order so plugin authors who follow the spec literally don't hit AttributeError.
Verified accurate ✅
DJ_STORESimpl (settings.py:732-752) — JSON parse, wholesale-replacesself.stores, raisesValueErrorwith the JSON errorDJ_IGNORE_CONFIG_FILEimpl (settings.py:356-362, 1027)- Precedence: source confirms DJ_STORES wholesale-replaces at line 751,
.secrets/fills missing only at 727-728 — matches the precedence table exactly .secrets/stores.<name>.<attr>arbitrary-attr support (settings.py:709-729)StorageAdapterbase class exists (storage_adapter.py:23);_COMMON_STORE_KEYSmatches the spec's common-keys list exactly- Error message strings match (
<protocol> store is missing: <fields>,Invalid key(s) for <protocol>: <fields>) /run/secrets/datajoint/is real (settings.py:57, 130)datajoint-python#1452is MERGED into v2.2.3 ✓
Substantive issues
1. dj.StorageAdapter and dj.get_storage_adapter() are NOT exported in v2.2.3
Spec uses them throughout:
- Line 154:
class DatabricksVolumesAdapter(dj.StorageAdapter): - Line 246:
adapter = dj.get_storage_adapter("s3")
grep "StorageAdapter\|get_storage_adapter" src/datajoint/__init__.py returns zero matches. Plugin authors following the spec literally will hit AttributeError: module 'datajoint' has no attribute 'StorageAdapter'. Either:
- Source needs
from .storage_adapter import StorageAdapter, get_storage_adapteradded to__init__.py(paired dj-python PR), or - Spec needs to use
from datajoint.storage_adapter import StorageAdapter
2. Built-in adapter classes don't exist
The "Built-in adapters" table (line 205-210) lists FileAdapter, S3Adapter, GCSAdapter, AzureAdapter. grep -rn "class FileAdapter\|class S3Adapter\|class GCSAdapter\|class AzureAdapter" src/datajoint/ returns zero matches.
Built-in protocols are hardcoded in StorageBackend._create_filesystem() (storage.py:342+) with if self.protocol == "file": ... elif "s3": ... elif "gcs": .... Also no [project.entry-points."datajoint.storage"] declaration in pyproject.toml for built-ins.
Spec line 117 also claims:
DataJoint's built-in
file,s3,gcs, andazureprotocols are themselvesStorageAdaptersubclasses.
That's false in v2.2.3. Plugin authors who try to read built-in adapter source as a reference will find nothing. Either the source needs to add the four built-in adapters and register them (paired PR), or the spec needs an "Implementation status" admonition like #177 added — signalling which pieces are forward-looking vs as-shipped.
3. Source docstrings still say "New in 2.3", docs say "New in 2.2.3"
The PR explicitly reframes from 2.3 → 2.2.3, but v2.2.3 source still says "New in 2.3" in:
settings.py:341— stores fieldsettings.py:348-349—DJ_STORES env-var supportsettings.py:361—ignore_config_filesettings.py:740—_apply_stores_envdocstring- Commit
0b1aca6ftitle:feat(config): DJ_STORES env var + DJ_IGNORE_CONFIG_FILE flag — new in 2.3 - PR #1452 title says "2.3"
When a user reads help(dj.config) they'll see "New in 2.3" but the release-notes page says "New in 2.2.3". Needs a paired source-side cleanup PR.
4. DJ_TLS=true typo not fully corrected
PR summary says "corrected DJ_TLS → DJ_USE_TLS". Source uses DJ_USE_TLS (settings.py:206). But manage-secrets.md:88 still has:
export DJ_TLS=trueLine 88 was missed — only the row in the bottom env-var table (line 203) got fixed.
Process / scope issues
5. Conflicts with PR #173 (PostgreSQL coverage on installation page)
This PR's installation.md diff is identical to #173's, PLUS adds the inlined PostgreSQL GRANT example (fixing my Issue #2 from #173), and configure-database.md:132 changes "alternative" → "peer database backend" (fixing my Issue #1 from #173).
Two paths forward:
- Close #173 in favor of #172 (this one supersedes), or
- Strip the installation / configure-database changes out of #172 so the env-var work is reviewable in isolation, and let #173 land first
6. Scope creep
Beyond the stated scope, this PR also touches:
installation.md— PostgreSQL coverage (duplicates #173)configure-database.md— "alternative" → "peer" framingmanage-pipeline-project.md—/sign-up→/contactURL changes (×2)data-pipelines.md— architecture-diagram reframe (open-source core vs platform), PNG → SVG swap- New
images/dj-platform.svg(binary)
None mentioned in the PR's "Updated pages" list. Several feel like separate concerns that warrant separate PRs for clean reviewability.
7. Thinh's review is DISMISSED
Almost certainly auto-dismissed by the rebase/force-push (827dd8d docs: rebase env-var PR onto current main). Worth re-requesting his review since he's already in the loop on this area.
Positive observations
whats-new-2-2-3.mdgives users the migration story they need; admonition style + precedence table read wellmkdocs.yamlnav placement is sensible (What's Newgroup +Reference → Specifications)- Storage Adapter spec structure (overview → when-to → base class → packaging → discovery → built-ins → credentials → errors → API ref) is exactly the right shape — the gaps are in factual accuracy, not in design
- Precedence model in source matches the docs precisely
Issues #1 and #2 are the most concrete: they're the ones a third-party plugin author trying to write a dj-databricks-storage package today would hit immediately. #3 and #4 are minor textual fixes. #5–#7 are merge-coordination questions. Happy to approve once #1, #2, #3, #4 are sorted (or once the spec is reframed with an "Implementation status" banner the way #177 was).
…LE) + Storage Adapter API spec — new in 2.3 Companion to datajoint/datajoint-python#1452. - src/about/whats-new-23.md (new) — 2.3 release notes - src/reference/specs/storage-adapter-api.md (new) — public spec for the datajoint.storage entry-point group used by built-in adapters and plugins - src/reference/configuration.md — DJ_STORES, DJ_IGNORE_CONFIG_FILE in env-var tables and Top-Level Settings; arbitrary-attr secrets example - src/how-to/manage-secrets.md — replace the never-implemented DJ_STORES_<NAME>_<ATTR> pattern with the real DJ_STORES JSON env var; add Env-var-only deployments section; fix DJ_TLS → DJ_USE_TLS - src/how-to/configure-storage.md — Configuring stores via environment variables - src/reference/specs/object-store-configuration.md — precedence table and DJ_STORES override semantics - mkdocs.yaml — register whats-new-23.md and storage-adapter-api.md
The DJ_STORES + DJ_IGNORE_CONFIG_FILE feature shipped in datajoint-python 2.2.3 (released 2026-06-05), not 2.3. Updates: - Rename src/about/whats-new-23.md → src/about/whats-new-2-2-3.md. - Refresh "What's New" page title, intro prose, version-added admonitions, and the GitHub release-notes link to point at the v2.2.3 tag. - "Upgrading from 2.2?" callout broadened to "Upgrading from 2.2.0–2.2.2?". - Version-added admonitions and "(new in 2.3)" inline tags across configuration.md, manage-secrets.md, configure-storage.md, object-store-configuration.md, storage-adapter-api.md → "2.2.3". - storage-adapter-api.md example: datajoint>=2.3 → datajoint>=2.2.3. - mkdocs.yaml nav: "What's New in 2.3" → "What's New in 2.2.3" with the renamed file path.
Fixes for items #2 and #4 from her review: - storage-adapter-api.md: removed the "Built-in adapters" section and the claim that file/s3/gcs/azure protocols are themselves StorageAdapter subclasses. In v2.2.3 those protocols are still served by the existing StorageBackend._create_filesystem() dispatch; migrating them onto the public adapter contract is tracked separately. - storage-adapter-api.md: added an "Available in datajoint ≥ 2.2.4" note near the StorageAdapter base-class introduction. dj.StorageAdapter / dj.get_storage_adapter() are not exported in 2.2.3; a paired datajoint-python PR adds the exports for 2.2.4. - whats-new-2-2-3.md: reframed the "Storage-adapter plugin contract" section to match — the contract is for third-party adapters; built-ins continue via internal dispatch. - manage-secrets.md: corrected the lone remaining DJ_TLS=true → DJ_USE_TLS=true in the env-var example (line 88). Items #3 (settings.py "New in 2.3" markers) and #1 (StorageAdapter export) are addressed by separate datajoint-python PRs.
827dd8d to
cbf1c33
Compare
|
Thanks for the thorough review, @MilagrosMarin — every item landed. Here's the response by number: #1 — #2 — Built-in adapter classes don't exist. Removed the "Built-in adapters" table from #3 — #4 — #5 / #6 — Scope creep on #7 — Thinh's review was dismissed by the force-push. Re-requested via Latest commit: |
datajoint-python 2.2.4 shipped to PyPI on 2026-06-05, bundling the DJ_STORES / DJ_IGNORE_CONFIG_FILE / arbitrary-attr-secrets work from 2.2.3 with the missing public surface: dj.StorageAdapter and dj.get_storage_adapter() exports (#1463), the packaging dep fix (#1462), and the settings.py docstring refresh (#1461). These docs are coordinated with 2.2.4 — that's where the documented Storage Adapter API surface (`dj.StorageAdapter`) actually works as the spec describes. Rebranding the whole page to 2.2.4 keeps the docs in sync with the version users will install. Changes: - Rename src/about/whats-new-2-2-3.md → whats-new-2-2-4.md and bump all version mentions, the page title, the "Upgrading from" range, and the GitHub release-notes link to v2.2.4. - All `*New in 2.2.3*` / `(new in 2.2.3)` markers across configuration.md, manage-secrets.md, configure-storage.md, object-store-configuration.md, and storage-adapter-api.md → 2.2.4. - Drop the "Available in datajoint ≥ 2.2.4" interim admonition from the Storage Adapter API spec — no longer needed now that 2.2.4 is the documented floor. - mkdocs.yaml nav: "What's New in 2.2.3" → "What's New in 2.2.4" with the renamed file path.
|
Update: datajoint-python 2.2.4 just shipped to PyPI (https://github.com/datajoint/datajoint-python/releases/tag/v2.2.4) with all of #1461 / #1462 / #1463 included — which means Latest commit
@MilagrosMarin / @ttngu207 / @mweitzel — ready for a final pass when one of you has a minute. Plan is to merge as soon as the rebrand looks good so the docs go live aligned with the 2.2.4 PyPI release. |
MilagrosMarin
left a comment
There was a problem hiding this comment.
Thanks @dimitri-yatsenko — verified the latest state against datajoint-python@v2.2.4 source. All seven items from my previous review are addressed:
✅ #1 dj.StorageAdapter / dj.get_storage_adapter() now exported in v2.2.4 (__init__.py:55-56, 88) via paired #1463.
✅ #2 Built-in adapters table removed from the spec; top admonition correctly notes built-ins continue via StorageBackend._create_filesystem().
✅ #3 v2.2.4 settings.py says "New in 2.2.3" (not "New in 2.3") at lines 341, 348-349, 361, 740 — fixed via paired #1461.
✅ #4 manage-secrets.md:88 now DJ_USE_TLS=true.
✅ #5/#6 Proper rebase done — PR diff is now exactly 7 files (mkdocs.yaml + 6 content pages), zero scope creep.
✅ #7 Thinh re-added as reviewer.
Also confirmed v2.2.4 shipped to PyPI on 2026-06-05 with all three paired PRs (#1461, #1462, #1463) included, and the spec's dj.StorageAdapter import examples now work as documented.
One small nuance, optional polish — not a blocker:
The d185cba rebrand pivots everything 2.2.3 → 2.2.4. The rationale (documented surface only fully works in 2.2.4 once dj.StorageAdapter is exported) is sound. But whats-new-2-2-4.md line 3:
DataJoint 2.2.4 introduces env-var-only configuration of storage...
is strictly off by one micro-version — those features shipped in 2.2.3 (settings.py confirms "New in 2.2.3"). A user running help(dj.config) on v2.2.4 will see "New in 2.2.3" while the release notes page says "New in 2.2.4". The "Upgrading from 2.2.0–2.2.3?" callout softens it, but if you want zero ambiguity:
- Soften "introduces" → "finalizes the public surface for" env-var configuration, or
- Headline 2.2.3 with a small "2.2.4 adds:
dj.StorageAdapterexports" subsection
Either way is fine. Approving — substantive concerns all resolved, docs are ready to ship with v2.2.4 on PyPI.
Summary
Documents the env-var configuration of object stores + the public Storage Adapter API contract, coordinated with the datajoint-python 2.2.4 release (just shipped to PyPI).
The env-var work shipped code-side in 2.2.3, but 2.2.4 is what users will install to get the full documented surface — including the
dj.StorageAdapter/dj.get_storage_adapter()exports added in datajoint-python#1463. Rebranding the docs wholesale to 2.2.4 keeps the documented and installable surfaces aligned.Companion code: all on master via 2.2.4 —
.secrets/attrs (env-var feature core)settings.pyversion markerspackagingas a dep (fixes clean-Py3.12 import failure)dj.StorageAdapter+dj.get_storage_adapter()at the top level (closes Milagros's review item Mkdocs #1)New pages
src/about/whats-new-2-2-4.md— 2.2.4 release notes:DJ_STORES,DJ_IGNORE_CONFIG_FILE, arbitrary-attr secrets, Storage Adapter API.src/reference/specs/storage-adapter-api.md— Plugin contract for third-party storage protocols. Documents thedatajoint.storageentry-point group, theStorageAdapterbase class, packaging conventions, and discovery. (Built-infile/s3/gcs/azureprotocols continue to be served by the existing internal dispatch inStorageBackend; migrating them onto this contract is tracked separately, per Milagros's item Prepare for deployment #2.)Updated pages
src/reference/configuration.md—DJ_STORESandDJ_IGNORE_CONFIG_FILErows in the Top-Level Settings table; env-var examples; arbitrary-attr.secrets/example.src/how-to/manage-secrets.md— RealDJ_STORESJSON pattern replacing the never-implementedDJ_STORES_<NAME>_<ATTR>pattern; Env-var-only deployments subsection; arbitrary-attr secrets note;DJ_TLS=true→DJ_USE_TLS=true(both spots fixed).src/how-to/configure-storage.md— New Configuring stores via environment variables section.src/reference/specs/object-store-configuration.md— Configuration sources & precedence table coveringDJ_STORESandDJ_IGNORE_CONFIG_FILE.mkdocs.yaml— Registeredwhats-new-2-2-4.mdandstorage-adapter-api.mdin the nav.Review history
Milagros raised seven items in her first-round review — all are addressed:
dj.StorageAdapter/dj.get_storage_adapter()exports → shipped in 2.2.4 via datajoint-python#1463settings.py"New in 2.3" markers → fixed by datajoint-python#1461 (merged)DJ_TLS=true→DJ_USE_TLS=trueatmanage-secrets.md:88fixedgit rebase origin/main— branch base moved from6f867bf→ current main; no inadvertent reverts of #173/#178/#179Test plan
mkdocs build --strictsucceedsstorage-adapter-api.md,whats-new-2-2-4.md,#env-var-only-deployments)datajoint==2.2.4behavior — verified live:DJ_IGNORE_CONFIG_FILE=true DJ_STORES='{...}' python -c "import datajoint as dj; print(dj.StorageAdapter, dj.config['stores'])"returns the class + the env-var-supplied storeswhats-new-2-2-4.mdrenderingRelated
datajoint==2.2.4