Skip to content

Commit 2dde3d9

Browse files
Add Codec base class with __init_subclass__ auto-registration
Design improvements for Python 3.10+: - Codecs auto-register when subclassed via __init_subclass__ - No decorator needed - just inherit from dj.Codec and set name - Use register=False for abstract base classes - Removed @dj.codec decorator from all examples New API: class GraphCodec(dj.Codec): name = "graph" def encode(...): ... def decode(...): ... Abstract bases: class ExternalOnlyCodec(dj.Codec, register=False): ... Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent 5cb5ae4 commit 2dde3d9

File tree

1 file changed

+115
-4
lines changed

1 file changed

+115
-4
lines changed

docs/src/design/tables/storage-types-spec.md

Lines changed: 115 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,119 @@ The `@` character in codec syntax indicates **external storage** (object store):
209209

210210
Some codecs support both modes (`<blob>`, `<attach>`), others are external-only (`<object@>`, `<hash@>`, `<filepath@>`).
211211

212+
### Codec Base Class
213+
214+
Codecs auto-register when subclassed using Python's `__init_subclass__` mechanism.
215+
No decorator is needed.
216+
217+
```python
218+
from abc import ABC, abstractmethod
219+
from typing import Any
220+
221+
# Global codec registry
222+
_codec_registry: dict[str, "Codec"] = {}
223+
224+
225+
class Codec(ABC):
226+
"""
227+
Base class for codec types. Subclasses auto-register by name.
228+
229+
Requires Python 3.10+.
230+
"""
231+
name: str | None = None # Must be set by concrete subclasses
232+
233+
def __init_subclass__(cls, *, register: bool = True, **kwargs):
234+
"""Auto-register concrete codecs when subclassed."""
235+
super().__init_subclass__(**kwargs)
236+
237+
if not register:
238+
return # Skip registration for abstract bases
239+
240+
if cls.name is None:
241+
return # Skip registration if no name (abstract)
242+
243+
if cls.name in _codec_registry:
244+
existing = _codec_registry[cls.name]
245+
if type(existing) is not cls:
246+
raise DataJointError(
247+
f"Codec <{cls.name}> already registered by {type(existing).__name__}"
248+
)
249+
return # Same class, idempotent
250+
251+
_codec_registry[cls.name] = cls()
252+
253+
def get_dtype(self, is_external: bool) -> str:
254+
"""
255+
Return the storage dtype for this codec.
256+
257+
Args:
258+
is_external: True if @ modifier present (external storage)
259+
260+
Returns:
261+
A core type (e.g., "bytes", "json") or another codec (e.g., "<hash>")
262+
"""
263+
raise NotImplementedError
264+
265+
@abstractmethod
266+
def encode(self, value: Any, *, key: dict | None = None, store_name: str | None = None) -> Any:
267+
"""Encode Python value for storage."""
268+
...
269+
270+
@abstractmethod
271+
def decode(self, stored: Any, *, key: dict | None = None) -> Any:
272+
"""Decode stored value back to Python."""
273+
...
274+
275+
def validate(self, value: Any) -> None:
276+
"""Optional validation before encoding. Override to add constraints."""
277+
pass
278+
279+
280+
def list_codecs() -> list[str]:
281+
"""Return list of registered codec names."""
282+
return sorted(_codec_registry.keys())
283+
284+
285+
def get_codec(name: str) -> Codec:
286+
"""Get codec by name. Raises DataJointError if not found."""
287+
if name not in _codec_registry:
288+
raise DataJointError(f"Unknown codec: <{name}>")
289+
return _codec_registry[name]
290+
```
291+
292+
**Usage - no decorator needed:**
293+
294+
```python
295+
class GraphCodec(dj.Codec):
296+
"""Auto-registered as <graph>."""
297+
name = "graph"
298+
299+
def get_dtype(self, is_external: bool) -> str:
300+
return "<blob>"
301+
302+
def encode(self, graph, *, key=None, store_name=None):
303+
return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}
304+
305+
def decode(self, stored, *, key=None):
306+
import networkx as nx
307+
G = nx.Graph()
308+
G.add_nodes_from(stored['nodes'])
309+
G.add_edges_from(stored['edges'])
310+
return G
311+
```
312+
313+
**Skip registration for abstract bases:**
314+
315+
```python
316+
class ExternalOnlyCodec(dj.Codec, register=False):
317+
"""Abstract base for external-only codecs. Not registered."""
318+
319+
def get_dtype(self, is_external: bool) -> str:
320+
if not is_external:
321+
raise DataJointError(f"<{self.name}> requires @ (external only)")
322+
return "json"
323+
```
324+
212325
### Codec Resolution and Chaining
213326

214327
Codecs resolve to core types through chaining. The `get_dtype(is_external)` method
@@ -471,7 +584,6 @@ blob format. Compatible with MATLAB.
471584
- **`<blob@store>`**: Stored in specific named store
472585

473586
```python
474-
@dj.codec
475587
class BlobCodec(dj.Codec):
476588
"""Serialized Python objects. Supports internal and external."""
477589
name = "blob"
@@ -511,7 +623,6 @@ Stores files with filename preserved. On fetch, extracts to configured download
511623
- **`<attach@store>`**: Stored in specific named store
512624

513625
```python
514-
@dj.codec
515626
class AttachCodec(dj.Codec):
516627
"""File attachment with filename. Supports internal and external."""
517628
name = "attach"
@@ -548,7 +659,6 @@ class Attachments(dj.Manual):
548659
Users can define custom codecs for domain-specific data:
549660

550661
```python
551-
@dj.codec
552662
class GraphCodec(dj.Codec):
553663
"""Store NetworkX graphs. Internal only (no external support)."""
554664
name = "graph"
@@ -562,6 +672,7 @@ class GraphCodec(dj.Codec):
562672
return {'nodes': list(graph.nodes()), 'edges': list(graph.edges())}
563673

564674
def decode(self, stored, *, key=None):
675+
import networkx as nx
565676
G = nx.Graph()
566677
G.add_nodes_from(stored['nodes'])
567678
G.add_edges_from(stored['edges'])
@@ -571,7 +682,6 @@ class GraphCodec(dj.Codec):
571682
Custom codecs can support both modes by returning different dtypes:
572683

573684
```python
574-
@dj.codec
575685
class ImageCodec(dj.Codec):
576686
"""Store images. Supports both internal and external."""
577687
name = "image"
@@ -679,6 +789,7 @@ def garbage_collect(store_name):
679789
15. **Lazy access**: `<object@>` and `<filepath@store>` return ObjectRef
680790
16. **MD5 for content hashing**: See [Hash Algorithm Choice](#hash-algorithm-choice) below
681791
17. **No separate registry**: Hash metadata stored in JSON columns, not a separate table
792+
18. **Auto-registration via `__init_subclass__`**: Codecs register automatically when subclassed—no decorator needed. Use `register=False` for abstract bases. Requires Python 3.10+.
682793

683794
### Hash Algorithm Choice
684795

0 commit comments

Comments
 (0)