22
33## Overview
44
5- This document defines a layered storage architecture:
5+ This document defines a two-layer storage architecture:
66
7- 1 . ** Database types** : ` longblob ` , ` varchar ` , ` int ` , ` json ` , etc.
8- 2 . ** Core DataJoint types** : ` object ` , ` content ` , ` filepath ` , ` json ` (and ` @store ` variants where applicable)
9- 3 . ** AttributeTypes** : ` <djblob> ` , ` <xblob> ` , ` <attach> ` , etc. (built on top of core types)
7+ 1 . ** Database types** : ` longblob ` , ` varchar ` , ` int ` , ` json ` , etc. (MySQL/PostgreSQL native)
8+ 2 . ** AttributeTypes** : Custom types with ` encode() ` /` decode() ` semantics
9+
10+ All DataJoint storage types (` object ` , ` content ` , ` filepath@store ` , ` <djblob> ` , etc.) are
11+ implemented as ** AttributeTypes** . Some are built-in (auto-registered, use ` dj.config ` for stores)
12+ while others are user-defined.
1013
1114### OAS Storage Regions
1215
@@ -20,17 +23,21 @@ This document defines a layered storage architecture:
2023` filepath@store ` provides portable relative paths within configured stores with lazy ObjectRef access.
2124For arbitrary URLs that don't need ObjectRef semantics, use ` varchar ` instead.
2225
23- ## Core Types
26+ ## Built-in AttributeTypes
27+
28+ Built-in types are auto-registered and use ` dj.config['stores'] ` for store configuration.
29+ They use ` json ` as their database dtype to store metadata.
2430
2531### ` object ` / ` object@store ` - Path-Addressed Storage
2632
27- ** Already implemented .** OAS (Object-Augmented Schema) storage:
33+ ** Built-in AttributeType .** OAS (Object-Augmented Schema) storage:
2834
2935- Path derived from primary key: ` {schema}/{table}/{pk}/{attribute}/ `
3036- One-to-one relationship with table row
3137- Deleted when row is deleted
3238- Returns ` ObjectRef ` for lazy access
3339- Supports direct writes (Zarr, HDF5) via fsspec
40+ - ** dtype** : ` json ` (stores path, store name, metadata)
3441
3542``` python
3643class Analysis (dj .Computed ):
@@ -42,9 +49,34 @@ class Analysis(dj.Computed):
4249 """
4350```
4451
52+ #### Implementation
53+
54+ ``` python
55+ class ObjectType (AttributeType ):
56+ """ Built-in AttributeType for path-addressed OAS storage."""
57+ type_name = " object"
58+ dtype = " json"
59+
60+ def encode (self , value , * , key = None , store_name = None ) -> dict :
61+ store = get_store(store_name or dj.config[' stores' ][' default' ])
62+ path = self ._compute_path(key) # {schema}/{table}/{pk}/{attr}/
63+ store.put(path, value)
64+ return {
65+ " path" : path,
66+ " store" : store_name,
67+ # Additional metadata (size, timestamps, etc.)
68+ }
69+
70+ def decode (self , stored : dict , * , key = None ) -> ObjectRef:
71+ return ObjectRef(
72+ store = get_store(stored[" store" ]),
73+ path = stored[" path" ]
74+ )
75+ ```
76+
4577### ` content ` / ` content@store ` - Content-Addressed Storage
4678
47- ** New core type .** Content-addressed storage with deduplication:
79+ ** Built-in AttributeType .** Content-addressed storage with deduplication:
4880
4981- ** Single blob only** : stores a single file or serialized object (not folders)
5082- ** Per-project scope** : content is shared across all schemas in a project (not per-schema)
@@ -53,6 +85,7 @@ class Analysis(dj.Computed):
5385- Reference counted for garbage collection
5486- Deduplication: identical content stored once across the entire project
5587- For folders/complex objects, use ` object ` type instead
88+ - ** dtype** : ` json ` (stores hash, store name, size, metadata)
5689
5790```
5891store_root/
@@ -63,58 +96,63 @@ store_root/
6396 └── {hash[:2]}/{hash[2:4]}/{hash}
6497```
6598
66- #### Content Type Behavior
67-
68- The ` content ` core type:
69- - Accepts ` bytes ` on insert
70- - Computes SHA256 hash of the content
71- - Stores in ` _content/{hash}/ ` if not already present (deduplication)
72- - Returns ` bytes ` on fetch (transparent retrieval)
73- - Registers in ` ContentRegistry ` for GC tracking
99+ #### Implementation
74100
75101``` python
76- # Core type behavior (built-in, not an AttributeType)
77- class ContentType :
78- """ Core content-addressed storage type."""
102+ class ContentType (AttributeType ):
103+ """ Built-in AttributeType for content-addressed storage."""
104+ type_name = " content"
105+ dtype = " json"
79106
80- def store (self , data : bytes , store_backend ) -> str :
81- """ Store content, return hash ."""
107+ def encode (self , data : bytes , * , key = None , store_name = None ) -> dict :
108+ """ Store content, return metadata as JSON ."""
82109 content_hash = hashlib.sha256(data).hexdigest()
110+ store = get_store(store_name or dj.config[' stores' ][' default' ])
83111 path = f " _content/ { content_hash[:2 ]} / { content_hash[2 :4 ]} / { content_hash} "
84112
85- if not store_backend .exists(path):
86- store_backend .put(path, data)
113+ if not store .exists(path):
114+ store .put(path, data)
87115 ContentRegistry().insert1({
88116 ' content_hash' : content_hash,
89- ' store' : store_backend.name ,
117+ ' store' : store_name ,
90118 ' size' : len (data)
91- })
119+ }, skip_duplicates = True )
92120
93- return content_hash
121+ return {
122+ " hash" : content_hash,
123+ " store" : store_name,
124+ " size" : len (data)
125+ }
94126
95- def retrieve (self , content_hash : str , store_backend ) -> bytes :
127+ def decode (self , stored : dict , * , key = None ) -> bytes :
96128 """ Retrieve content by hash."""
97- path = f " _content/ { content_hash[:2 ]} / { content_hash[2 :4 ]} / { content_hash} "
98- return store_backend.get(path)
129+ store = get_store(stored[" store" ])
130+ path = f " _content/ { stored[' hash' ][:2 ]} / { stored[' hash' ][2 :4 ]} / { stored[' hash' ]} "
131+ return store.get(path)
99132```
100133
101134#### Database Column
102135
103- The ` content ` type stores a ` char(64) ` hash in the database :
136+ The ` content ` type stores JSON metadata :
104137
105138``` sql
106- -- content column
107- features CHAR (64 ) NOT NULL -- SHA256 hex hash
139+ -- content column (MySQL)
140+ features JSON NOT NULL
141+ -- Contains: {"hash": "abc123...", "store": "main", "size": 12345}
142+
143+ -- content column (PostgreSQL)
144+ features JSONB NOT NULL
108145```
109146
110147### ` filepath@store ` - Portable External Reference
111148
112- ** Upgraded from legacy .** Relative path references within configured stores:
149+ ** Built-in AttributeType .** Relative path references within configured stores:
113150
114151- ** Relative paths** : paths within a configured store (portable across environments)
115152- ** Store-aware** : resolves paths against configured store backend
116153- Returns ` ObjectRef ` for lazy access via fsspec
117154- Stores optional checksum for verification
155+ - ** dtype** : ` json ` (stores path, store name, checksum, metadata)
118156
119157** Key benefit** : Portability. The path is relative to the store, so pipelines can be moved
120158between environments (dev → prod, cloud → local) by changing store configuration without
@@ -154,42 +192,43 @@ ref.open() # fsspec streaming access
154192For arbitrary URLs (S3, HTTP, etc.) where you don't need ObjectRef semantics,
155193just use ` varchar ` . A string is simpler and more transparent.
156194
157- #### Filepath Type Behavior
195+ #### Implementation
158196
159197``` python
160- # Core type behavior
161- class FilepathType :
162- """ Core external reference type with store-relative paths."""
198+ class FilepathType (AttributeType ):
199+ """ Built-in AttributeType for store-relative file references."""
200+ type_name = " filepath"
201+ dtype = " json"
163202
164- def store (self , relative_path : str , store_backend , compute_checksum : bool = False ) -> dict :
203+ def encode (self , relative_path : str , * , key = None , store_name = None ,
204+ compute_checksum : bool = False ) -> dict :
165205 """ Register reference to file in store."""
166- metadata = {' path' : relative_path}
206+ store = get_store(store_name) # store_name required for filepath
207+ metadata = {' path' : relative_path, ' store' : store_name}
167208
168209 if compute_checksum:
169- full_path = store_backend .resolve(relative_path)
170- if store_backend .exists(full_path):
171- metadata[' checksum' ] = compute_file_checksum(store_backend , full_path)
172- metadata[' size' ] = store_backend .size(full_path)
210+ full_path = store .resolve(relative_path)
211+ if store .exists(full_path):
212+ metadata[' checksum' ] = compute_file_checksum(store , full_path)
213+ metadata[' size' ] = store .size(full_path)
173214
174215 return metadata
175216
176- def retrieve (self , metadata : dict , store_backend ) -> ObjectRef:
217+ def decode (self , stored : dict , * , key = None ) -> ObjectRef:
177218 """ Return ObjectRef for lazy access."""
178219 return ObjectRef(
179- store = store_backend ,
180- path = metadata [' path' ],
181- checksum = metadata .get(' checksum' ) # optional verification
220+ store = get_store(stored[ ' store ' ]) ,
221+ path = stored [' path' ],
222+ checksum = stored .get(' checksum' ) # optional verification
182223 )
183224```
184225
185226#### Database Column
186227
187- The ` filepath ` type uses the ` json ` core type:
188-
189228``` sql
190229-- filepath column (MySQL)
191230recording JSON NOT NULL
192- -- Contains: {"path": "experiment_001/data.nwb", "checksum": "...", "size": ...}
231+ -- Contains: {"path": "experiment_001/data.nwb", "store": "main", " checksum": "...", "size": ...}
193232
194233-- filepath column (PostgreSQL)
195234recording JSONB NOT NULL
@@ -205,49 +244,52 @@ recording JSONB NOT NULL
205244| Paths | Relative | Relative (unchanged) |
206245| Store param | Required (` @store ` ) | Required (` @store ` ) |
207246
247+ ## Database Types
248+
208249### ` json ` - Cross-Database JSON Type
209250
210- ** New core type. ** JSON storage compatible across MySQL and PostgreSQL:
251+ JSON storage compatible across MySQL and PostgreSQL:
211252
212253``` sql
213254-- MySQL
214255column_name JSON NOT NULL
215256
216- -- PostgreSQL
257+ -- PostgreSQL (uses JSONB for better indexing)
217258column_name JSONB NOT NULL
218259```
219260
220- The ` json ` core type:
261+ The ` json ` database type:
262+ - Used as dtype by built-in AttributeTypes (` object ` , ` content ` , ` filepath@store ` )
221263- Stores arbitrary JSON-serializable data
222264- Automatically uses appropriate type for database backend
223265- Supports JSON path queries where available
224266
225267## Parameterized AttributeTypes
226268
227- AttributeTypes can be parameterized with ` <type@param> ` syntax. The parameter is passed
228- through to the underlying dtype :
269+ AttributeTypes can be parameterized with ` <type@param> ` syntax. The parameter specifies
270+ which store to use :
229271
230272``` python
231273class AttributeType :
232- type_name: str # Name used in <brackets>
233- dtype: str # Base underlying type
274+ type_name: str # Name used in <brackets> or as bare type
275+ dtype: str # Database type or built-in AttributeType
234276
235- # When user writes <type_name@param>, resolved dtype becomes:
236- # f"{dtype}@{param}" if param specified, else dtype
277+ # When user writes type_name@param, resolved store becomes param
237278```
238279
239280** Resolution examples:**
240281```
241- <xblob> → dtype = " content" → default store
242- <xblob@cold> → dtype = " content@cold" → cold store
243- <djblob> → dtype = "longblob" → database
244- <djblob@x> → ERROR: longblob doesn't support parameters
282+ <xblob> → uses content type → default store
283+ <xblob@cold> → uses content type → cold store
284+ <djblob> → dtype = "longblob" → database (no store)
285+ object@cold → uses object type → cold store
245286```
246287
247- This means ` <xblob> ` and ` <xblob@store> ` share the same AttributeType class - the
248- parameter flows through to the core type, which validates whether it supports ` @store ` .
288+ AttributeTypes can use other AttributeTypes as their dtype (composition):
289+ - ` <xblob> ` uses ` content ` - adds djblob serialization on top of content-addressed storage
290+ - ` <xattach> ` uses ` content ` - adds filename preservation on top of content-addressed storage
249291
250- ## AttributeTypes (Built on Core Types)
292+ ## User-Defined AttributeTypes
251293
252294### ` <djblob> ` - Internal Serialized Blob
253295
@@ -364,31 +406,35 @@ class Attachments(dj.Manual):
364406```
365407┌───────────────────────────────────────────────────────────────────┐
366408│ AttributeTypes │
367- │ <djblob> <xblob> <attach> <xattach> <custom> │
409+ │ │
410+ │ Built-in: object content filepath@s │
411+ │ User: <djblob> <xblob> <attach> <xattach> <custom> │
368412├───────────────────────────────────────────────────────────────────┤
369- │ Core DataJoint Types │
370- │ longblob content object filepath@s json │
371- │ content@s object@s │
372- ├───────────────────────────────────────────────────────────────────┤
373- │ Database Types │
374- │ LONGBLOB CHAR(64) JSON JSON/JSONB VARCHAR etc. │
375- │ (MySQL) (PostgreSQL) │
413+ │ Database Types (dtype) │
414+ │ │
415+ │ LONGBLOB JSON/JSONB VARCHAR INT etc. │
376416└───────────────────────────────────────────────────────────────────┘
377417```
378418
419+ All storage types are AttributeTypes:
420+ - ** Built-in** : ` object ` , ` content ` , ` filepath@store ` - auto-registered, use ` dj.config `
421+ - ** User-defined** : ` <djblob> ` , ` <xblob> ` , ` <attach> ` , ` <xattach> ` , ` <custom> ` - registered via ` @dj.register_type `
422+
379423## Storage Comparison
380424
381- | Type | Core Type | Storage Location | Dedup | Returns |
382- | ------| -----------| ------------------| -------| ---------|
425+ | Type | dtype | Storage Location | Dedup | Returns |
426+ | ------| -------| ------------------| -------| ---------|
427+ | ` object ` | ` json ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
428+ | ` object@s ` | ` json ` | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
429+ | ` content ` | ` json ` | ` _content/{hash} ` | Yes | bytes |
430+ | ` content@s ` | ` json ` | ` _content/{hash} ` | Yes | bytes |
431+ | ` filepath@s ` | ` json ` | Configured store (relative path) | No | ObjectRef |
383432| ` <djblob> ` | ` longblob ` | Database | No | Python object |
384433| ` <xblob> ` | ` content ` | ` _content/{hash} ` | Yes | Python object |
385434| ` <xblob@s> ` | ` content@s ` | ` _content/{hash} ` | Yes | Python object |
386435| ` <attach> ` | ` longblob ` | Database | No | Local file path |
387436| ` <xattach> ` | ` content ` | ` _content/{hash} ` | Yes | Local file path |
388437| ` <xattach@s> ` | ` content@s ` | ` _content/{hash} ` | Yes | Local file path |
389- | ` object ` | — | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
390- | ` object@s ` | — | ` {schema}/{table}/{pk}/ ` | No | ObjectRef |
391- | ` filepath@s ` | ` json ` | Configured store (relative path) | No | ObjectRef |
392438
393439## Reference Counting for Content Type
394440
@@ -435,10 +481,11 @@ def garbage_collect(project):
435481 (ContentRegistry() & {' content_hash' : content_hash}).delete()
436482```
437483
438- ## Core Type Comparison
484+ ## Built-in AttributeType Comparison
439485
440486| Feature | ` object ` | ` content ` | ` filepath@store ` |
441487| ---------| ----------| -----------| ------------------|
488+ | dtype | ` json ` | ` json ` | ` json ` |
442489| Location | OAS store | OAS store | Configured store |
443490| Addressing | Primary key | Content hash | Relative path |
444491| Path control | DataJoint | DataJoint | User |
@@ -456,20 +503,21 @@ def garbage_collect(project):
456503
457504## Key Design Decisions
458505
459- 1 . ** Layered architecture** : Core types (` object ` , ` content ` , ` filepath@store ` , ` json ` ) separate from AttributeTypes
460- 2 . ** Two OAS regions** : object (PK-addressed) and content (hash-addressed) within managed stores
461- 3 . ** Filepath for portability** : ` filepath@store ` uses relative paths within stores for environment portability
462- 4 . ** No ` uri ` type** : For arbitrary URLs, use ` varchar ` —simpler and more transparent
463- 5 . ** Content type** : Single-blob, content-addressed, deduplicated storage
464- 6 . ** JSON core type** : Cross-database compatible (MySQL JSON, PostgreSQL JSONB)
465- 7 . ** Parameterized types** : ` <type@param> ` passes parameter to underlying dtype
466- 8 . ** Naming convention** :
506+ 1 . ** Two-layer architecture** : Database types (` json ` , ` longblob ` , etc.) and AttributeTypes
507+ 2 . ** All storage types are AttributeTypes** : Built-in (` object ` , ` content ` , ` filepath@store ` ) and user-defined (` <djblob> ` , etc.)
508+ 3 . ** Built-in types use JSON dtype** : Stores metadata (path, hash, store name, etc.) in JSON columns
509+ 4 . ** Two OAS regions** : object (PK-addressed) and content (hash-addressed) within managed stores
510+ 5 . ** Filepath for portability** : ` filepath@store ` uses relative paths within stores for environment portability
511+ 6 . ** No ` uri ` type** : For arbitrary URLs, use ` varchar ` —simpler and more transparent
512+ 7 . ** Content type** : Single-blob, content-addressed, deduplicated storage
513+ 8 . ** Parameterized types** : ` <type@param> ` passes parameter to underlying dtype
514+ 9 . ** Naming convention** :
467515 - ` <djblob> ` = internal serialized (database)
468516 - ` <xblob> ` = external serialized (content-addressed)
469517 - ` <attach> ` = internal file (single file)
470518 - ` <xattach> ` = external file (single file)
471- 9 . ** Transparent access** : AttributeTypes return Python objects or file paths
472- 10 . ** Lazy access** : ` object ` , ` object@store ` , and ` filepath@store ` return ObjectRef
519+ 10 . ** Transparent access** : AttributeTypes return Python objects or file paths
520+ 11 . ** Lazy access** : ` object ` , ` object@store ` , and ` filepath@store ` return ObjectRef
473521
474522## Migration from Legacy Types
475523
0 commit comments