You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Add Configuration Immutability section warning about changing settings
- Clarify database_name is for multi-database DBMS platforms
- Implement =OBJ[.ext]= display format in preview.py for query results
- Add objects property to Heading class
- Add ObjectRef.to_dict() method for raw metadata access
- Fix conflicting text about staged insert hashing
- Document explicit hash kwarg with design principles
- Rename file_storage to object_storage utility
- Document grace period for orphan cleanup race condition
Copy file name to clipboardExpand all lines: docs/src/design/tables/object-type-spec.md
+63-17Lines changed: 63 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -134,6 +134,20 @@ For local filesystem storage:
134
134
|`object_storage.access_key`| string | For cloud | Access key (can use secrets file) |
135
135
|`object_storage.secret_key`| string | For cloud | Secret key (can use secrets file) |
136
136
137
+
### Configuration Immutability
138
+
139
+
**CRITICAL**: Once a project has been instantiated (i.e., `datajoint_store.json` has been created and the first object stored), the following settings MUST NOT be changed:
140
+
141
+
-`object_storage.project_name`
142
+
-`object_storage.protocol`
143
+
-`object_storage.bucket`
144
+
-`object_storage.location`
145
+
-`object_storage.partition_pattern`
146
+
147
+
Changing these settings after objects have been stored will result in **broken references**—existing paths stored in the database will no longer resolve to valid storage locations.
148
+
149
+
DataJoint validates `project_name` against `datajoint_store.json` on connect, but administrators must ensure other settings remain consistent across all clients for the lifetime of the project.
150
+
137
151
### Environment Variables
138
152
139
153
Settings can be overridden via environment variables:
|`format_version`| string | Yes | Store format version for compatibility |
211
225
|`datajoint_version`| string | Yes | DataJoint version that created the store |
212
226
|`database_host`| string | No | Database server hostname (for bidirectional mapping) |
213
-
|`database_name`| string | No | Database name (for bidirectional mapping) |
227
+
|`database_name`| string | No | Database name on the server (for bidirectional mapping) |
214
228
215
-
The optional `database_host` and `database_name` fields enable bidirectional mapping between object stores and databases. This is informational only - not enforced at runtime. Administrators can alternatively ensure unique `project_name` values across their namespace, and managed platforms may handle this mapping externally.
229
+
The `database_name` field exists for DBMS platforms that support multiple databases on a single server (e.g., PostgreSQL, MySQL). The object storage configuration is **shared across all schemas comprising the pipeline**—it's a pipeline-level setting, not a per-schema setting.
230
+
231
+
The optional `database_host` and `database_name` fields enable bidirectional mapping between object stores and databases:
232
+
233
+
-**Forward**: Client settings → object store location
234
+
-**Reverse**: Object store metadata → originating database
235
+
236
+
This is informational only—not enforced at runtime. Administrators can alternatively ensure unique `project_name` values across their namespace, and managed platforms may handle this mapping externally.
216
237
217
238
### Store Initialization
218
239
@@ -362,19 +383,28 @@ For large hierarchical data like Zarr stores, computing certain metadata can be
362
383
363
384
By default, **no content hash is computed** to avoid performance overhead for large objects. Storage backend integrity is trusted.
│ └─ On failure: orphaned file remains (acceptable) │
@@ -871,19 +901,35 @@ Orphaned files (files in storage without corresponding database records) may acc
871
901
872
902
### Orphan Cleanup Procedure
873
903
874
-
Orphan cleanup is a **separate maintenance operation**that must be performed during maintenance windows to avoid race conditions with concurrent inserts.
904
+
Orphan cleanup is a **separate maintenance operation**provided via the `schema.object_storage` utility object.
875
905
876
906
```python
877
-
# Maintenance utility methods
878
-
schema.file_storage.find_orphaned() # List files not referenced in DB
**Note**: `schema.object_storage` is a utility object, not a hidden table. Unlike `attach@store` which uses `~external_*` tables, the `object` type stores all metadata inline in JSON columns and has no hidden tables.
915
+
916
+
**Grace period for in-flight inserts:**
917
+
918
+
While random tokens prevent filename collisions, there's a race condition with in-flight inserts:
919
+
920
+
1. Insert starts: file copied to storage with token `Ax7bQ2kM`
921
+
2. Orphan cleanup runs: lists storage, queries DB for references
922
+
3. File `Ax7bQ2kM` not yet in DB (INSERT not committed)
923
+
4. Cleanup identifies it as orphan and deletes it
924
+
5. Insert commits: DB now references deleted file!
925
+
926
+
**Solution**: The `grace_period_minutes` parameter (default: 30) excludes files created within that window, assuming they are in-flight inserts.
927
+
882
928
**Important considerations:**
883
-
-Should be run during low-activity periods
884
-
-Uses transactions or locking to avoid race conditions with concurrent inserts
885
-
-Files recently uploaded (within a grace period) are excluded to handle in-flight inserts
886
-
-Provides dry-run mode to preview deletions before execution
929
+
-Grace period handles race conditions—cleanup is safe to run anytime
930
+
-Running during low-activity periods reduces in-flight operations to reason about
931
+
-`dry_run=True` previews deletions before execution
932
+
-Compares storage contents against JSON metadata in table columns
0 commit comments