Skip to content

Commit 6fcc4d3

Browse files
Add parameterized AttributeTypes and content vs object comparison
- content type is single-blob only (no folders) - Parameterized syntax: <type@param> passes param to dtype - Add content vs object comparison table - Clarify when to use each type Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent 7ae8f15 commit 6fcc4d3

File tree

1 file changed

+50
-7
lines changed

1 file changed

+50
-7
lines changed

docs/src/design/tables/storage-types-spec.md

Lines changed: 50 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,10 +34,12 @@ class Analysis(dj.Computed):
3434

3535
**New core type.** Content-addressed storage with deduplication:
3636

37-
- Path derived from content hash: `_content/{hash[:2]}/{hash[2:4]}/{hash}/`
37+
- **Single blob only**: stores a single file or serialized object (not folders)
38+
- Path derived from content hash: `_content/{hash[:2]}/{hash[2:4]}/{hash}`
3839
- Many-to-one: multiple rows can reference same content
3940
- Reference counted for garbage collection
4041
- Deduplication: identical content stored once
42+
- For folders/complex objects, use `object` type instead
4143

4244
```
4345
store_root/
@@ -92,6 +94,31 @@ The `content` type stores a `char(64)` hash in the database:
9294
features CHAR(64) NOT NULL -- SHA256 hex hash
9395
```
9496

97+
## Parameterized AttributeTypes
98+
99+
AttributeTypes can be parameterized with `<type@param>` syntax. The parameter is passed
100+
through to the underlying dtype:
101+
102+
```python
103+
class AttributeType:
104+
type_name: str # Name used in <brackets>
105+
dtype: str # Base underlying type
106+
107+
# When user writes <type_name@param>, resolved dtype becomes:
108+
# f"{dtype}@{param}" if param specified, else dtype
109+
```
110+
111+
**Resolution examples:**
112+
```
113+
<xblob> → dtype = "content" → default store
114+
<xblob@cold> → dtype = "content@cold" → cold store
115+
<djblob> → dtype = "longblob" → database
116+
<djblob@x> → ERROR: longblob doesn't support parameters
117+
```
118+
119+
This means `<xblob>` and `<xblob@store>` share the same AttributeType class - the
120+
parameter flows through to the core type, which validates whether it supports `@store`.
121+
95122
## AttributeTypes (Built on Core Types)
96123

97124
### `<djblob>` - Internal Serialized Blob
@@ -272,17 +299,33 @@ def garbage_collect(schema):
272299
(ContentRegistry() & {'content_hash': content_hash}).delete()
273300
```
274301

302+
## Content vs Object: When to Use Each
303+
304+
| Feature | `content` | `object` |
305+
|---------|-----------|----------|
306+
| Addressing | Content hash (SHA256) | Path (from primary key) |
307+
| Deduplication | Yes | No |
308+
| Structure | Single blob only | Files, folders, Zarr, HDF5 |
309+
| Access | Transparent (returns bytes) | Lazy (returns ObjectRef) |
310+
| GC | Reference counted | Deleted with row |
311+
| Use case | Serialized data, file attachments | Large/complex objects, streaming |
312+
313+
**Rule of thumb:**
314+
- Need deduplication or storing serialized Python objects? → `content` via `<xblob>`
315+
- Need folders, Zarr, HDF5, or streaming access? → `object`
316+
275317
## Key Design Decisions
276318

277319
1. **Layered architecture**: Core types (`content`, `object`) separate from AttributeTypes
278-
2. **Content type**: New core type for content-addressed, deduplicated storage
279-
3. **Naming convention**:
320+
2. **Content type**: Single-blob, content-addressed, deduplicated storage
321+
3. **Parameterized types**: `<type@param>` passes parameter to underlying dtype
322+
4. **Naming convention**:
280323
- `<djblob>` = internal serialized (database)
281324
- `<xblob>` = external serialized (content-addressed)
282-
- `<attach>` = internal file
283-
- `<xattach>` = external file
284-
4. **Transparent access**: AttributeTypes return Python objects or file paths, not references
285-
5. **Lazy access for objects**: Only `object`/`object@store` returns ObjectRef
325+
- `<attach>` = internal file (single file)
326+
- `<xattach>` = external file (single file)
327+
5. **Transparent access**: AttributeTypes return Python objects or file paths, not references
328+
6. **Lazy access for objects**: Only `object`/`object@store` returns ObjectRef
286329

287330
## Migration from Legacy Types
288331

0 commit comments

Comments
 (0)