Skip to content

Commit b2dba2c

Browse files
committed
Add a "compact" JSON variant
1 parent 27f9892 commit b2dba2c

File tree

1 file changed

+67
-31
lines changed

1 file changed

+67
-31
lines changed

docs/design/datacontracts/data_descriptor.md

Lines changed: 67 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -115,19 +115,23 @@ multiple times, later definitions take precedence.
115115

116116
### Version
117117

118-
This is version 0 of the physical descriptor
118+
This is version 0 of the physical descriptor.
119119

120120
### Summary
121121

122-
A data descriptor may be stored in the "JSON with comments" format.
122+
A data descriptor may be stored in the "JSON with comments" format. There are two formats: a
123+
"regular" format and a "compact" format. The baseline data descriptor may be either regular or
124+
compact. The in-memory descriptor will typically be compact.
123125

124126
The toplevel dictionary will contain:
125127

126128
* `"version": 0`
127-
* `"types": TYPE_ARRAY` see below
128-
* `"globals": VALUE_ARRAY` see below
129+
* `"types": TYPES_DESCRIPTOR` see below
130+
* `"globals": GLOBALS_DESCRIPTOR` see below
129131

130-
### Types
132+
### Types descriptor
133+
134+
**Regular format**:
131135

132136
The types will be in an array, with each type described by a dictionary containing keys:
133137

@@ -141,6 +145,29 @@ Each `FIELD_ARRAY` is an array of dictionaries each containing keys:
141145
* `"type": "type name"` the name of a primitive type or another type defined in the logical descriptor
142146
* optional `"offset": int | "unknown"` the offset of the field or "unknown". If omitted, same as "unknown".
143147

148+
**Compact format**:
149+
150+
The types will be in a dictionary, with each type name being the key and a `FIELD_DICT` dictionary as a value.
151+
152+
The `FIELD_DICT` will have a field name as a key, or the special name `"!"` as a key.
153+
154+
If a key is `!` the value is an `int` giving the total size of the struct. The key must be omitted
155+
if the size is indeterminate.
156+
157+
If the key is any other string, the value may be one of:
158+
159+
* `[int, "type name"]` giving the type and offset of the field
160+
* `int` giving just the offset of the field with the type left unspecified
161+
162+
Unknown offsets are not supported in the compact format.
163+
164+
Rationale: the compact format is expected ot be used for the in-memory data descriptor. In the
165+
common case the field type is known from the baseline descriptor. As a result, a field descriptor
166+
like `"field_name": 36` is the minimum necessary information to be conveyed. If the field is not
167+
present in the baseline, then `"field_name": [12, "uint16"]` may be used.
168+
169+
**Both formats**:
170+
144171
Note that the logical descriptor does not contain "unknown" offsets: it is expected that the
145172
in-memory data descriptor will augment the baseline with a known offset for all fields in the
146173
baseline.
@@ -150,23 +177,39 @@ in-memory descriptor is expected to provide the offset of the field.
150177

151178
### Global values
152179

180+
**Regular format**:
181+
153182
The global values will be in an array, with each value described by a dictionary containing keys:
154183

155184
* `"name": "global value name"` the name of the global value
156185
* `"type": "type name"` the type of the global value
157186
* optional `"value": VALUE | [ int ] | "unknown"` the value of the global value, or an offset in an auxiliary array containing the value or "unknown".
158187

159-
Note that the logical descriptor does not contain "unknown" values: it is expected that the
160-
in-memory data descriptor will augment the baseline with a known offset for all fields in the
161-
baseline.
162-
163188
The `VALUE` may be a JSON numeric constant integer or a string containing a signed or unsigned
164189
decimal or hex (with prefix `0x` or `0X`) integer constant. The constant must be within the range
165190
of the type of the global value.
166191

192+
**Compact format**:
193+
194+
The global values will be in a dictionary, with each key being the name of a global and the values being one of:
195+
196+
* `[VALUE | [int], "type name"]` the type and value of a global
197+
* `VALUE | [int]` just the value of a global
198+
199+
As in the regular format, `VALUE` is a numeric constant or a string containing an integer constant.
200+
201+
Note that a two element array is unambiguously "type and value", whereas a one-element array is
202+
unambiguosly "indirect value".
203+
204+
**Both formats**
205+
167206
For pointer and nuint globals, the value may be assumed to fit in a 64-bit unsigned integer. For
168207
nint globals, the value may be assumed to fit in a 64-bit signed integer.
169208

209+
Note that the logical descriptor does not contain "unknown" values: it is expected that the
210+
in-memory data descriptor will augment the baseline with a known offset for all fields in the
211+
baseline.
212+
170213
If the value is given as a single-element array `[ int ]` then the value is stored in an auxiliary
171214
array that is part of the data contract descriptor. Only in-memory data descriptors may have
172215
indirect values; baseline data descriptors may not have indirect values.
@@ -179,10 +222,14 @@ The indirection array is not part of the data descriptor spec. It is expected t
179222
contract descriptor will include it. (The data contract descriptor must contain: the data
180223
descriptor, the set of compatible algorithmic contracts, the aux array of globals).
181224

225+
226+
182227
## Example
183228

184229
This is an example of a baseline descriptor for a 64-bit architecture. Suppose it has the name `"example-64"`
185230

231+
The baseline is given in the "regular" format.
232+
186233
```jsonc
187234
{
188235
"version": 0,
@@ -219,33 +266,22 @@ This is an example of a baseline descriptor for a 64-bit architecture. Suppose i
219266
}
220267
```
221268

222-
The following is an example of an in-memory descriptor that references the above baseline:
269+
The following is an example of an in-memory descriptor that references the above baseline. The in-memory descriptor is in the "compact" format:
223270

224271
```jsonc
225272
{
226273
"version": "0",
227274
"baseline": "example-64",
228-
"types": [
229-
{
230-
"name": "Thread",
231-
"fields": [
232-
{ "name": "ThreadId", "offset": 32 },
233-
{ "name": "ThreadState", "offset": 0 },
234-
{ "name": "Next", "offset": 128 }
235-
]
236-
},
237-
{
238-
"name": "ThreadStore",
239-
"fields": [
240-
{ "name": "ThreadCount", "offset": 32 }
241-
{ "name": "ThreadList", "offset": 8 }
242-
]
243-
}
244-
],
245-
"globals": [
246-
{ "name": "FEATURE_COMINTEROP", "value": "0"},
247-
{ "name": "s_pThreadStore", "value": [ 0 ] } // indirect from aux data offset 0
248-
]
275+
"types":
276+
{
277+
"Thread": { "ThreadId": 32, "ThreadState": 0, "Next": 128 },
278+
"ThreadStore": { "ThreadCount": 32, "ThreadList": 8 }
279+
},
280+
"globals":
281+
{
282+
"FEATURE_COMINTEROP": 0,
283+
"s_pThreadStore": [ 0 ] // indirect from aux data offset 0
284+
}
249285
}
250286
```
251287

0 commit comments

Comments
 (0)