Skip to content

Commit e1b3be1

Browse files
Add staged insert documentation to implementation plan
- Document staged_insert.py for direct object storage writes - Add flow comparison: normal insert vs staged insert - Include staged_insert.py in critical files summary Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent dd8c623 commit e1b3be1

File tree

1 file changed

+33
-0
lines changed

1 file changed

+33
-0
lines changed

docs/src/design/tables/storage-types-implementation-plan.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -193,6 +193,38 @@ class ObjectType(AttributeType):
193193
- `ref.download(dest)` - Download to local path
194194
- `ref.listdir()` / `ref.walk()` - For directories
195195

196+
### Staged Insert for Object Types
197+
198+
For large objects like Zarr arrays, `staged_insert.py` provides direct writes to storage:
199+
200+
```python
201+
with table.staged_insert1 as staged:
202+
# 1. Set primary key first (required for path construction)
203+
staged.rec['subject_id'] = 123
204+
staged.rec['session_id'] = 45
205+
206+
# 2. Get storage handle and write directly
207+
z = zarr.open(staged.store('raw_data', '.zarr'), mode='w')
208+
z[:] = large_array
209+
210+
# 3. On exit: metadata computed, record inserted
211+
```
212+
213+
**Flow comparison:**
214+
215+
| Normal Insert | Staged Insert |
216+
|--------------|---------------|
217+
| `ObjectType.encode()` uploads content | Direct writes via `staged.store()` |
218+
| Single operation | Two-phase: write then finalize |
219+
| Good for files/folders | Ideal for Zarr, HDF5, streaming |
220+
221+
Both produce the same JSON metadata format compatible with `ObjectRef.from_json()`.
222+
223+
**Key methods:**
224+
- `staged.store(field, ext)` - Returns `FSMap` for Zarr/xarray
225+
- `staged.open(field, ext)` - Returns file handle for binary writes
226+
- `staged.fs` - Raw fsspec filesystem access
227+
196228
---
197229

198230
## Phase 3: User-Defined AttributeTypes
@@ -365,6 +397,7 @@ def garbage_collect(schemas: list, store_name: str, dry_run=True) -> dict:
365397
| `src/datajoint/content_registry.py` || Content storage functions (put, get, delete) |
366398
| `src/datajoint/objectref.py` || ObjectRef handle for lazy access |
367399
| `src/datajoint/storage.py` || StorageBackend, build_object_path |
400+
| `src/datajoint/staged_insert.py` || Staged insert for direct object storage writes |
368401
| `src/datajoint/table.py` || Type chain encoding on insert |
369402
| `src/datajoint/fetch.py` || Type chain decoding on fetch |
370403
| `src/datajoint/blob.py` || Removed bypass_serialization |

0 commit comments

Comments
 (0)