Author:
Rashmi Rao, John Cox, Salman Malik, Google Privacy Sandbox
Protected App Signals provides ad-techs a way to egress data from inside the privacy boundary to their own servers for model training. See the Protected App Signals Explainer, particularly the Reporting section, for background. Refer design for Protected App Signals with B&A here.
This document describes the following:
-
The set of feature types available to egress information to adtech servers for model training for Protected App Signals.
-
The wire format of each feature.
-
The wire format of the
egressPayload
itself.
The wire representation of the egressPayload
will be noised per feature type. More details will be provided in a future explainer update.
Based on the information in this document, adtechs can write parsers to prepare the egressPayload
and transform the values it contains into features for use in their model training systems.
We support two kinds of feature types: primitives, which can contain a single feature value; and collections, which can contain multiple primitives.
Nullability
(Nullability will be a fast-follow feature; it will not be available immediately upon the release of egress vector. Please see this addendum describing a schema modification so your de-serializer can be prepared for nullability support.)
Every feature type can optionally be made nullable, in which case the value null
can be represented. By default features are not nullable. A null value for a type is indicated by adding a signaling bit, 0 (for null) or 1 (for non-null), to the least-significant bit of the serialized representation. Consequently, the wire representation for a feature with nullable = true
is always 1 bit larger than with nullable = false
.
The contents of collections can be nullable and are handled as above. Collections themselves can also be nullable; handled the same way: a null value's wire representation is 0s for all value bits, with the 0 signaling bit.
Because values are not nullable by default, values mentioned in the examples below are not nullable unless stated.
These types represent single values: booleans, unsigned integers, and signed integers
This represents a single boolean value.
Expected value: true
or false
, or optionally null
Parameters:
nullable
: boolean indicating whether theboolean-feature-type
can represent a null state or not. Defaults tofalse
.
Wire format
Non-nullable:
0 (false
) or 1 (true
)
Nullable:
01 (false
) or 11 (true
), or 00 (null
)
This represents a single non-negative integer value.
Parameters:
-
size
: unsigned integer indicating the number of bits based on the range of values.-
Expected value: non-negative integer value in the range [0,2^size-1]
-
For example, consider an
unsigned-integer-feature-type
ofsize
=3
. It can have a value:[0, 7]
and will occupy 3 bits on the wire (or four bits on the wire if nullable, see examples below).
-
-
nullable
: boolean indicating whether theunsigned-integer-feature-type
can represent a null state or not. Defaults tofalse
.
Wire format
Binary representation of the unsigned integer. The integer will be converted to wire format following little endian byte order. If nullable, will be prefixed by the indicator byte.
Examples:
Non-nullable:
If value
= 5
and size
= 3
, wire format will be 101
Nullable:
If value
= 5
and size
= 3
, wire format will be 1011
.
If value
= 0
and size
= 3
, wire format will be 0001
If value
= null
and size
= 3
, wire format will be 0000
This type can be used to represent a single positive or negative integer value.
Parameters:
-
size
: unsigned integer indicating the number of bits based on the range of values.-
Expected value: integer value in the range [-2^(size-1),2^(size-1)-1]
-
For example, consider a
signed-integer-feature-type
ofsize
=4
. It can have a value[-8, 7]
, and will occupy 4 bits on the wire (or 5 bits on the wire if nullable, see examples below).
-
-
nullable
: boolean indicating whether thesigned-integer-feature-type
can represent a null state or not. Defaults tofalse
.
Wire format
2’s complement representation of the signed integer. The integer will be converted to wire format following little endian byte order.
Examples:
Non-nullable:
If value
= -3
and size
=4
, wire format will be 1101
Nullable:
If value
= -3
and size
=4
, wire format will be 11011
If value
= 0
and size
= 4
, wire format will be 00001
If value
= null
and size
= 4
, wire format will be 00000
These feature types represent a collection of homogeneous or heterogeneous values.
The wire representation of the values will be in the right-to-left order.
This type can be used to represent an ordered list of boolean-feature-type
values.
Expected value: list of boolean (true
or false
, or optionally null
) values, or optionally null
Parameters
allow-multiple
: indicates whether the bucket can contain multipletrue
values.size
: number of values in the bucket.nullable
: boolean indicating whether thebucket-feature-type
as a whole can represent a null state or not. Defaults tofalse
.
Wire format
Sequential bit representation of boolean-feature-type
values.
Examples:
Non-nullable collection:
Consider a bucket-feature-type
of size
= 4
. If the values are [true
, false
, true
, false
] this will occupy 4
bits on the wire. Wire format will be 0101
Again for size
= 4
, now consider the values all being nullable = true
. If the values are [true
, false
, true
, null
] this will occupy 8
bits on the wire. Wire format will be 00 11 01 11
(spaces for readability).
Nullable collection
Consider a bucket-feature-type
of size
= 4
. If the values are [true
, false
, true
, false
] this will occupy 5
bits on the wire. Wire format will be 01011
This type can be used to represent an ordered, heterogeneous list of unsigned-integer-feature-type
and signed-integer-feature-type
values.
Parameters
size
: unsigned integer indicating the number of fixed size values in the histogram.
Expected values: list of unsigned-integer-feature-type
and signed-integer-feature-type
values
Wire format
In general the wire format for the histogram is the wire format of each contained value, right-to-left.
Examples:
Non-nullable histogram:
If the histogram contains 2
elements, the first of which is an unsigned 3-bit integer with the value 5
, and the second is a signed 4
-bit integer with the value -3
, then the wire format would be 1101 101
(spaces for readability).
Consider the same example, except the second value is now nullable. Now the wire format requires eight bits instead of seven, and becomes 11011 101
.
Consider the same example again, except now the second value is nullable and indeed null
. The wire format still requires eight bits and is 00000 101
.
Nullable histogram:
Consider the original example, except the histogram is now itself nullable. Now the representation requires eight bits, becoming 1101 101 1
.
Now consider if both the histogram itself is nullable, and the second value is nullable and indeed null
. The wire format now requires nine bits and is 00000 101 1
.
The definition of the wire format of a payload is called its protocol. Below we describe the first wire format, or protocol version 1
. The protocol version included in the payload will be set by the platform.
A payload is made up of two parts: a header containing metadata information used for serialization and deserialization; and a body containing serialized feature values.
The header itself has two parts:
-
Protocol version: unsigned
5
-bit int indicating the version of the wire format specification used to encode the payload. -
Schema version: unsigned
3
-bit int. Version identifier for the schema that defines the payload.
The wire format of the header is the protocol version, then the schema version, right-to-left. For example, if the protocol version is 1
(00001
on the wire) and the schema version is 2
(010
on the wire), the header will be 01000001
.
The body contains serialized feature values, with values as defined in each feature type above. The order of the features is the same as their order in the provided schema for the payload, right-to-left.
The body is 0
-padded. Details of the padding are slightly different for egressPayload
and temporaryUnlimitedEgressPayload
:
egressPayload
is first0
-padded to its maximum size in bits, then0
-padded to the nearest byte.temporaryUnlimitedEgressPayload
is0
-padded to the nearest byte.
- Protocol version :
1
- Example schema version :
2
- Example max wire size for
egressPayload
:20
bits
Consider this example schema, feature values and corresponding wire format for each feature type specified the schema:
Feature type
(Defined in the schema) |
Feature type in collection
(Defined in the schema) |
Parameters
(Defined in the schema) |
Corresponding value in Json | Wire representation | |
histogram-feature-type with size = 2
|
unsigned-int-feature-type
|
size = 3
|
5
|
101
|
|
signed-int-feature-type
|
size = 4
|
-3
|
1101
|
||
boolean-feature-type
|
false
|
0
|
|||
bucket-feature-type
|
boolean-feature-type
|
size = 4 , allow-multiple = true ,
nullable = true
|
[true, false, true, false]
|
01011
|
|
boolean-feature-type
|
nullable = true
|
null
|
00
|
Wire format of the feature values would be:
Wire format of the feature values + padding would be:
Wire format of the feature values + padding + header would be:
Wire format of the feature values + padding would be:
Wire format of the feature values + padding + header would be:
While nullability will not be supported immediately upon the release of egress vector support, AdTechs can write their schemas in such a way that the serialized representation of the egress vector is identical before and after nullability is supported.
Recall that a null value for a type is indicated by adding a signaling bit to the least-significant bit of the serialized representation. AdTechs can simply add a boolean-feature-type
before each value which they intend to make nullable, and set it to indicate whether the following value is null: 0 (for null) or 1 (for non-null).
Because this approach is identical to the way nullability support will be implemented, it allows AdTechs to write de-serialziation code now. Upon the release of nullability support, AdTechs will remove the boolean nullability flags and mark the values to which they correspond as nullable in the schema, and update their code for building the egress vector accordingly.
Consider the example schema from above, modified to show nullability flags in bold before nullability support:
Feature type
(Defined in the schema) |
Feature type in collection
(Defined in the schema) |
Parameters
(Defined in the schema) |
Corresponding value in Json | Wire representation | |
histogram-feature-type with size = 2
|
unsigned-int-feature-type
|
size = 3
|
5
|
101
|
|
signed-int-feature-type
|
size = 4
|
-3
|
1101
|
||
boolean-feature-type
|
false
|
0
|
|||
boolean-feature-type
|
true
|
1
|
|||
bucket-feature-type
|
boolean-feature-type
|
size = 4 , allow-multiple = true
|
[true, false, true, false]
|
0101
|
|
boolean-feature-type
|
false
|
0
|
|||
boolean-feature-type
|
false
|
0
|
The wire format displayed in the diagram above still applies - which is why this workaround allows de-serialization code to be written before nullability support is released.