Skip to content

Commit 5d2c9f6

Browse files
Add text to core types and document type modifier policy
Core types: - Add `text` as a core type for unlimited-length text (TEXT in both MySQL and PostgreSQL) Type modifiers policy: - Document that SQL modifiers (NOT NULL, DEFAULT, PRIMARY KEY, UNIQUE, COMMENT) are not allowed - DataJoint has its own syntax - Document that AUTO_INCREMENT is discouraged but allowed with native types - UNSIGNED is allowed as part of type semantics Co-authored-by: dimitri-yatsenko <dimitri@datajoint.com>
1 parent 7e32089 commit 5d2c9f6

File tree

3 files changed

+35
-8
lines changed

3 files changed

+35
-8
lines changed

docs/src/design/tables/attributes.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ Use these portable, scientist-friendly types for cross-database compatibility.
3434

3535
- `char(n)`: fixed-length string of exactly *n* characters.
3636
- `varchar(n)`: variable-length string up to *n* characters.
37+
- `text`: unlimited-length text for long-form content (notes, descriptions, abstracts).
3738
- `enum(...)`: one of several enumerated values, e.g., `enum("low", "medium", "high")`.
3839
Do not use enums in primary keys due to difficulty changing definitions.
3940

@@ -65,9 +66,9 @@ for portable pipelines. Using native types will generate a warning.
6566
- `tinyint`, `smallint`, `int`, `bigint` (with optional `unsigned`)
6667
- `float`, `double`, `real`
6768
- `tinyblob`, `blob`, `mediumblob`, `longblob`
68-
- `text`, `mediumtext`, `longtext`
69+
- `tinytext`, `mediumtext`, `longtext` (size variants)
6970
- `time`, `timestamp`, `year`
70-
- `mediumint`, `serial`
71+
- `mediumint`, `serial`, `int auto_increment`
7172

7273
See the [storage types spec](storage-types-spec.md) for complete mappings.
7374

@@ -133,10 +134,9 @@ class Measurement(dj.Manual):
133134

134135
## Datatypes not (yet) supported
135136

136-
- `binary`
137-
- `text`
138-
- `longtext`
139-
- `bit`
137+
- `binary(n)` / `varbinary(n)` - use `bytes` instead
138+
- `bit(n)` - use `int` types with bitwise operations
139+
- `set(...)` - use `json` for multiple selections
140140

141141
For additional information about these datatypes, see
142142
http://dev.mysql.com/doc/refman/5.6/en/data-types.html

docs/src/design/tables/storage-types-spec.md

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This document defines a three-layer type architecture:
1818
│ Core DataJoint Types (Layer 2) │
1919
│ │
2020
│ float32 float64 int64 uint64 int32 uint32 int16 uint16 │
21-
│ int8 uint8 bool uuid json bytes date datetime
21+
│ int8 uint8 bool uuid json bytes date datetime text
2222
│ char(n) varchar(n) enum(...) decimal(n,f) │
2323
├───────────────────────────────────────────────────────────────────┤
2424
│ Native Database Types (Layer 1) │
@@ -74,6 +74,7 @@ MySQL and PostgreSQL backends. Users should prefer these over native database ty
7474
|-----------|-------------|-------|------------|
7575
| `char(n)` | Fixed-length | `CHAR(n)` | `CHAR(n)` |
7676
| `varchar(n)` | Variable-length | `VARCHAR(n)` | `VARCHAR(n)` |
77+
| `text` | Unlimited text | `TEXT` | `TEXT` |
7778

7879
### Boolean
7980

@@ -118,10 +119,34 @@ for serialized Python objects.
118119

119120
### Native Passthrough Types
120121

121-
Users may use native database types directly (e.g., `text`, `mediumint auto_increment`),
122+
Users may use native database types directly (e.g., `mediumint`, `tinyblob`),
122123
but these will generate a warning about non-standard usage. Native types are not recorded
123124
in field comments and may have portability issues across database backends.
124125

126+
### Type Modifiers Policy
127+
128+
DataJoint table definitions have their own syntax for constraints and metadata. SQL type
129+
modifiers are **not allowed** in type specifications because they conflict with DataJoint's
130+
declarative syntax:
131+
132+
| Modifier | Status | DataJoint Alternative |
133+
|----------|--------|----------------------|
134+
| `NOT NULL` / `NULL` | ❌ Not allowed | Position above/below `---` determines nullability |
135+
| `DEFAULT value` | ❌ Not allowed | Use `= value` syntax after type |
136+
| `PRIMARY KEY` | ❌ Not allowed | Position above `---` line |
137+
| `UNIQUE` | ❌ Not allowed | Use DataJoint index syntax |
138+
| `COMMENT 'text'` | ❌ Not allowed | Use `# comment` syntax |
139+
| `AUTO_INCREMENT` | ⚠️ Discouraged | Allowed with native types only, generates warning |
140+
| `UNSIGNED` | ✅ Allowed | Part of type semantics (use `uint*` core types) |
141+
142+
**Auto-increment policy:** DataJoint discourages `AUTO_INCREMENT` / `SERIAL` because:
143+
- Breaks reproducibility (IDs depend on insertion order)
144+
- Makes pipelines non-deterministic
145+
- Complicates data migration and replication
146+
- Primary keys should be meaningful, not arbitrary
147+
148+
If required, use native types: `int auto_increment` or `serial` (with warning).
149+
125150
## AttributeTypes (Layer 3)
126151

127152
AttributeTypes provide `encode()`/`decode()` semantics on top of core types. They are

src/datajoint/declare.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,8 @@
4242
# String types (with parameters)
4343
"char": (r"char\s*\(\d+\)$", None),
4444
"varchar": (r"varchar\s*\(\d+\)$", None),
45+
# Unlimited text
46+
"text": (r"text$", None),
4547
# Enumeration
4648
"enum": (r"enum\s*\(.+\)$", None),
4749
# Fixed-point decimal

0 commit comments

Comments
 (0)