55This document defines a layered storage architecture:
66
771 . ** MySQL types** : ` longblob ` , ` varchar ` , ` int ` , etc.
8- 2 . ** Core DataJoint types** : ` object ` , ` content ` (and their ` @store ` variants)
8+ 2 . ** Core DataJoint types** : ` object ` , ` content ` , ` filepath ` (and their ` @store ` variants)
993 . ** AttributeTypes** : ` <djblob> ` , ` <xblob> ` , ` <attach> ` , etc. (built on top of core types)
1010
11+ ### Three OAS Storage Regions
12+
13+ | Region | Path Pattern | Addressing | Use Case |
14+ | --------| --------------| ------------| ----------|
15+ | Object | ` {schema}/{table}/{pk}/ ` | Primary key | Large objects, Zarr, HDF5 |
16+ | Content | ` _content/{hash} ` | Content hash | Deduplicated blobs/files |
17+ | Filepath | ` _files/{user-path} ` | User-defined | User-organized files |
18+
1119## Core Types
1220
1321### ` object ` / ` object@store ` - Path-Addressed Storage
@@ -44,11 +52,14 @@ class Analysis(dj.Computed):
4452
4553```
4654store_root/
47- ├── {schema}/{table}/{pk}/ # object storage (path-addressed)
55+ ├── {schema}/{table}/{pk}/ # object storage (path-addressed by PK )
4856│ └── {attribute}/
4957│
50- └── _content/ # content storage (content-addressed)
51- └── {hash[:2]}/{hash[2:4]}/{hash}/
58+ ├── _content/ # content storage (content-addressed)
59+ │ └── {hash[:2]}/{hash[2:4]}/{hash}
60+ │
61+ └── _files/ # filepath storage (user-addressed)
62+ └── {user-defined-path}
5263```
5364
5465#### Content Type Behavior
@@ -95,6 +106,92 @@ The `content` type stores a `char(64)` hash in the database:
95106features CHAR (64 ) NOT NULL -- SHA256 hex hash
96107```
97108
109+ ### ` filepath ` / ` filepath@store ` - User-Addressed Storage
110+
111+ ** Upgraded from legacy.** User-defined path organization with ObjectRef access:
112+
113+ - ** User controls paths** : relative path specified by user (not derived from PK or hash)
114+ - Stored in ` _files/{user-path} ` within the store
115+ - Returns ` ObjectRef ` for lazy access (no automatic copying)
116+ - Stores checksum in database for verification
117+ - Supports files and folders (like ` object ` )
118+
119+ ``` python
120+ class RawData (dj .Manual ):
121+ definition = """
122+ session_id : int
123+ ---
124+ recording : filepath@raw # user specifies path
125+ """
126+
127+ # Insert - user provides relative path
128+ table.insert1({
129+ ' session_id' : 1 ,
130+ ' recording' : ' experiment_001/session_001/data.nwb'
131+ })
132+
133+ # Fetch - returns ObjectRef (lazy, no copy)
134+ row = (table & ' session_id=1' ).fetch1()
135+ ref = row[' recording' ] # ObjectRef
136+ ref.download(' /local/path' ) # explicit download
137+ ref.open() # fsspec streaming access
138+ ```
139+
140+ #### Filepath Type Behavior
141+
142+ ``` python
143+ # Core type behavior
144+ class FilepathType :
145+ """ Core user-addressed storage type."""
146+
147+ def store (self , user_path : str , store_backend ) -> dict :
148+ """
149+ Register filepath, return metadata.
150+ File must already exist at _files/{user_path} in store.
151+ """
152+ full_path = f " _files/ { user_path} "
153+ if not store_backend.exists(full_path):
154+ raise FileNotFoundError (f " File not found: { full_path} " )
155+
156+ # Compute checksum for verification
157+ checksum = store_backend.checksum(full_path)
158+ size = store_backend.size(full_path)
159+
160+ return {
161+ ' path' : user_path,
162+ ' checksum' : checksum,
163+ ' size' : size
164+ }
165+
166+ def retrieve (self , metadata : dict , store_backend ) -> ObjectRef:
167+ """ Return ObjectRef for lazy access."""
168+ return ObjectRef(
169+ path = f " _files/ { metadata[' path' ]} " ,
170+ store = store_backend,
171+ checksum = metadata.get(' checksum' ) # for verification
172+ )
173+ ```
174+
175+ #### Database Column
176+
177+ The ` filepath ` type stores JSON metadata:
178+
179+ ``` sql
180+ -- filepath column
181+ recording JSON NOT NULL
182+ -- Contains: {"path": "...", "checksum": "...", "size": ...}
183+ ```
184+
185+ #### Key Differences from Legacy ` filepath@store `
186+
187+ | Feature | Legacy | New |
188+ | ---------| --------| -----|
189+ | Access | Copy to local stage | ObjectRef (lazy) |
190+ | Copying | Automatic | Explicit via ` ref.download() ` |
191+ | Streaming | No | Yes via ` ref.open() ` |
192+ | Folders | No | Yes |
193+ | Interface | Returns local path | Returns ObjectRef |
194+
98195## Parameterized AttributeTypes
99196
100197AttributeTypes can be parameterized with ` <type@param> ` syntax. The parameter is passed
@@ -235,31 +332,32 @@ class Attachments(dj.Manual):
235332## Type Layering Summary
236333
237334```
238- ┌─────────────────────────────────────────────────────────────┐
239- │ AttributeTypes │
240- │ <djblob> <xblob> <attach> <xattach> <custom> │
241- ├─────────────────────────────────────────────────────────────┤
242- │ Core DataJoint Types │
243- │ longblob content object │
244- │ content@store object@store │
245- ├─────────────────────────────────────────────────────────────┤
246- │ MySQL Types │
247- │ LONGBLOB CHAR(64) JSON VARCHAR INT etc. │
248- └─────────────────────────────────────────────────────────────┘
335+ ┌─────────────────────────────────────────────────────────────────── ┐
336+ │ AttributeTypes │
337+ │ <djblob> <xblob> <attach> <xattach> <custom> │
338+ ├─────────────────────────────────────────────────────────────────── ┤
339+ │ Core DataJoint Types │
340+ │ longblob content object filepath │
341+ │ content@s object@s filepath@s │
342+ ├─────────────────────────────────────────────────────────────────── ┤
343+ │ MySQL Types │
344+ │ LONGBLOB CHAR(64) JSON JSON VARCHAR etc. │
345+ └─────────────────────────────────────────────────────────────────── ┘
249346```
250347
251348## Storage Comparison
252349
253- | AttributeType | Core Type | Storage Location | Dedup | Returns |
254- | --------------- | -----------| ------------------| -------| ---------|
350+ | Type | Core Type | Storage Location | Dedup | Returns |
351+ | ------| -----------| ------------------| -------| ---------|
255352| ` <djblob> ` | ` longblob ` | Database | No | Python object |
256- | ` <xblob> ` | ` content ` | ` _content/{hash}/ ` | Yes | Python object |
257- | ` <xblob@store > ` | ` content@store ` | ` _content/{hash}/ ` | Yes | Python object |
353+ | ` <xblob> ` | ` content ` | ` _content/{hash} ` | Yes | Python object |
354+ | ` <xblob@s > ` | ` content@s ` | ` _content/{hash} ` | Yes | Python object |
258355| ` <attach> ` | ` longblob ` | Database | No | Local file path |
259- | ` <xattach> ` | ` content ` | ` _content/{hash}/ ` | Yes | Local file path |
260- | ` <xattach@store> ` | ` content@store ` | ` _content/{hash}/ ` | Yes | Local file path |
261- | — | ` object ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
262- | — | ` object@store ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
356+ | ` <xattach> ` | ` content ` | ` _content/{hash} ` | Yes | Local file path |
357+ | ` <xattach@s> ` | ` content@s ` | ` _content/{hash} ` | Yes | Local file path |
358+ | ` object ` | — | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
359+ | ` object@s ` | — | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
360+ | ` filepath@s ` | — | ` _files/{user-path} ` | No | ObjectRef |
263361
264362## Reference Counting for Content Type
265363
@@ -306,33 +404,37 @@ def garbage_collect(project):
306404 (ContentRegistry() & {' content_hash' : content_hash}).delete()
307405```
308406
309- ## Content vs Object: When to Use Each
407+ ## Core Type Comparison
310408
311- | Feature | ` content ` | ` object ` |
312- | ---------| -----------| ----------|
313- | Addressing | Content hash (SHA256) | Path (from primary key) |
314- | Deduplication | Yes | No |
315- | Structure | Single blob only | Files, folders, Zarr, HDF5 |
316- | Access | Transparent (returns bytes) | Lazy (returns ObjectRef) |
317- | GC | Reference counted | Deleted with row |
318- | Use case | Serialized data, file attachments | Large/complex objects, streaming |
409+ | Feature | ` object ` | ` content ` | ` filepath ` |
410+ | ---------| ----------| -----------| ------------|
411+ | Addressing | Primary key | Content hash | User-defined path |
412+ | Path control | DataJoint | DataJoint | User |
413+ | Deduplication | No | Yes | No |
414+ | Structure | Files, folders, Zarr | Single blob only | Files, folders |
415+ | Access | ObjectRef (lazy) | Transparent (bytes) | ObjectRef (lazy) |
416+ | GC | Deleted with row | Reference counted | Deleted with row |
417+ | Checksum | Optional | Implicit (is the hash) | Stored in DB |
319418
320- ** Rule of thumb:**
321- - Need deduplication or storing serialized Python objects? → ` content ` via ` <xblob> `
322- - Need folders, Zarr, HDF5, or streaming access? → ` object `
419+ ** When to use each:**
420+ - ** ` object ` ** : Large/complex objects where DataJoint controls organization (Zarr, HDF5)
421+ - ** ` content ` ** : Deduplicated serialized data or file attachments via ` <xblob> ` , ` <xattach> `
422+ - ** ` filepath ` ** : User-managed file organization, external data sources
323423
324424## Key Design Decisions
325425
326- 1 . ** Layered architecture** : Core types (` content ` , ` object ` ) separate from AttributeTypes
327- 2 . ** Content type** : Single-blob, content-addressed, deduplicated storage
328- 3 . ** Parameterized types** : ` <type@param> ` passes parameter to underlying dtype
329- 4 . ** Naming convention** :
426+ 1 . ** Layered architecture** : Core types (` object ` , ` content ` , ` filepath ` ) separate from AttributeTypes
427+ 2 . ** Three OAS regions** : object (PK-addressed), content (hash-addressed), filepath (user-addressed)
428+ 3 . ** Content type** : Single-blob, content-addressed, deduplicated storage
429+ 4 . ** Filepath upgrade** : Returns ObjectRef (lazy) instead of copying files
430+ 5 . ** Parameterized types** : ` <type@param> ` passes parameter to underlying dtype
431+ 6 . ** Naming convention** :
330432 - ` <djblob> ` = internal serialized (database)
331433 - ` <xblob> ` = external serialized (content-addressed)
332434 - ` <attach> ` = internal file (single file)
333435 - ` <xattach> ` = external file (single file)
334- 5 . ** Transparent access** : AttributeTypes return Python objects or file paths, not references
335- 6 . ** Lazy access for objects ** : Only ` object ` / ` object@store ` returns ObjectRef
436+ 7 . ** Transparent access** : AttributeTypes return Python objects or file paths
437+ 8 . ** Lazy access** : ` object ` , ` object@store ` , and ` filepath@store ` return ObjectRef
336438
337439## Migration from Legacy Types
338440
@@ -342,7 +444,7 @@ def garbage_collect(project):
342444| ` blob@store ` | ` <xblob@store> ` |
343445| ` attach ` | ` <attach> ` |
344446| ` attach@store ` | ` <xattach@store> ` |
345- | ` filepath@store ` | Deprecated (use ` object @store` or ` <xattach@store> ` ) |
447+ | ` filepath@store ` (copy-based) | ` filepath @store` (ObjectRef-based, upgraded ) |
346448
347449### Migration from Legacy ` ~external_* ` Stores
348450
@@ -404,5 +506,5 @@ def migrate_external_store(schema, store_name):
404506
4055071 . Should ` content ` without ` @store ` use a default store, or require explicit store?
4065082 . Should we support ` <xblob> ` without ` @store ` syntax (implying default store)?
407- 3 . Should ` filepath@store ` be kept for backward compat or fully deprecated ?
509+ 3 . Should ` filepath ` without ` @store ` be supported (using default store) ?
4085104 . How long should the backward compatibility layer support legacy ` ~external_* ` format?
0 commit comments