-
Notifications
You must be signed in to change notification settings - Fork 40
Writing Bops: The Bebop Schema Language
Bebop schemas are written in the custom Bebop Schema Language, which this page documents.
A Bebop schema consists of a series of definitions, each introduced by a keyword. Here is an example schema, demonstrating the kinds of definition Bebop accepts:
const int32 PianoKeys = 88;
const guid ImportantProductID = "a3628ec7-28d4-4546-ad4a-f6ebf5375c96";
enum Instrument {
Sax = 0;
Trumpet = 1;
Clarinet = 2;
}
struct Performer {
string name;
Instrument plays;
}
message Song {
1 -> string title;
2 -> uint16 year;
3 -> Performer[] performers;
}
union Album {
1 -> struct StudioAlbum {
Song[] tracks;
}
2 -> message LiveAlbum {
1 -> Song[] tracks;
2 -> string venueName;
3 -> date concertDate;
}
}
Let's go over each of these:
A const
definition defines a single typed value. You can't refer to these values in the rest of the Bebop schema, but they come in handy when the parties using your schemas also need to agree on certain application-wide parameters.
Currently, valid const
types are: boolean, integers, floats (including inf
, -inf
, nan
), strings, and GUIDs.
An enum
defines a type that acts as a wrapper around an integer type (defaults to uint32
), with certain named constants, each having a corresponding underlying integer value. It is used much like an enum
in C.
The syntax is:
enum Flavor: uint8 { Vanilla = 1; Chocolate = 2; Mint = 3; }
.
Unlike in C, all constants must be explicitly given an integer literal value.
You should never remove a constant from an
enum
definition. Instead, put[deprecated("reason here")]
in front of the name.You're free to add new constants to an
enum
at any point in the future.
By default, a Bebop enum type is not supposed to represent any underlying values outside of the ones listed.
If you want a more C-like, bitflags-like behavior, add a [flags]
attribute before the enum:
[flags]
enum Permissions {
Read = 0x01;
Write = 0x02;
Comment = 0x04;
}
Defined this way, Permissions
values like 0
(no permissions) and 3
(Read
+ Write
) are valid too.
A struct
defines an aggregation of "fields", containing typed values in a fixed order. All values are always present. It is used much like a struct
in C.
The syntax is:
struct Point { int32 x; int32 y; }
.
The binary representation of a
struct
is simply that of all field values in order.
This means it's more compact and efficient thanmessage
.When you define a
struct
, you're promising to never add or remove fields from it.
(If this turns out to be necessary, you'll have to define astruct MyStructV2
and deprecate the oldstruct MyStruct
.)When you define a struct with the
readonly
modifier the Bebop compiler guarantees that it's values cannot be modified or updated after decoding takes place. Use this to ensure data integrity when marshalling between language domains.
A message
defines an indexed aggregation of fields containing typed values, each of which may be absent. It might correspond to something like a class
in Java, or a JSON object.
The syntax is:
message Song { 1 -> string title; 2 -> uint16 year; }
— note the indices before each field.
In the binary representation of a
message
, the message is prefixed with its length, and each field is prefixed with its index.It's okay to add fields to a
message
with new indices later — in fact, this is the whole point ofmessage
. (When an unrecognized field index is encountered in the process of decoding amessage
, it is skipped over. This allows for compatibility with versions of your app that use an older version of the schema.)
A union
defines a tagged union of one or more inline struct
or message
definitions. Each is preceded by a "discriminator" or "tag" value. This defines a type whose values may assume any one of the aggregate layouts defined inside. It corresponds to something like C++'s std::variant.
The syntax is:
union U { 1 -> message A { ... }; 2 -> struct B { ... } }
.
The binary representation of a
U
value is then: a length prefix, followed by either (a) a01
byte followed by an encoding of anA
message, or (b) a02
byte followed by an encoding of aB
struct.Just like with messages, new branches may be added to a union later. When an unrecognized discriminator value is encountered, the length prefix is used to skip over the body, and decoding fails in a way your program may catch.
Nested types are not available globally but do reserve the identifier globally. E.g. in the above you cannot define
struct Other { A x; }
becauseA
is private toU
but you also cannot definestruct A { ... }
becauseA
is reserved globally.
Note: Unions are currently not yet supported when targeting the Dart programming language.
When talking about Bebop, the word "record" is used to mean "either a struct
, message
, or union
".
The word "aggregate" is used to mean "either a struct
or a message
" — as these are direct aggregations of fields (data).
In an aggregate definition, each field is specified by giving the name of the type of the field, followed by the name of the field, followed by ;
.
The following types are built-ins:
Name | Description |
---|---|
bool |
A Boolean value, true or false. |
byte |
An unsigned 8-bit integer. uint8 is an alias. |
uint16 |
An unsigned 16-bit integer. |
int16 |
A signed 16-bit integer. |
uint32 |
An unsigned 32-bit integer. |
int32 |
A signed 32-bit integer. |
uint64 |
An unsigned 64-bit integer. |
int64 |
A signed 64-bit integer. |
float32 |
A 32-bit IEEE single-precision floating point number. |
float64 |
A 64-bit IEEE double-precision floating point number. |
string |
A length-prefixed UTF-8-encoded string. |
guid |
A GUID. |
date |
A UTC date / timestamp. |
T[] |
A length-prefixed array of T values. array[T] is an alias. |
map[T1, T2] |
A map, as a length-prefixed array of (T1 , T2 ) association pairs. |
You may also use user-defined types (enum
s and other records) as field types.
A string is stored as a length-prefixed array of bytes. All length-prefixes are 32-bit unsigned integers, which means the maximum number of bytes in a string, or entries in an array or map, is about 4 billion (2^32).
A guid
is stored as 16 bytes, in Guid.ToByteArray order.
A date
is stored as a 64-bit integer amount of “ticks” since 00:00:00 UTC on January 1 of year 1 A.D. in the Gregorian calendar, where a “tick” is 100 nanoseconds.
Use [deprecated("We no longer use this")]
before a field. When encoding a message
deprecated fields are skipped. A notice will also be copied into the generated code.
Use [opcode(0x12345678)]
before a record definition to associate an identifying "opcode" with it. You can also use a 4-byte ASCII string as an opcode: [opcode("Ping")]
.
Strictly speaking, Bebop is not opinionated about what you do with these opcodes. But you may find it useful to send this kind of thing over the wire:
12 34 56 78 03 00 00 00 18 00 ...
[4-byte opcode] [Bebop-encoded data]
And use the 4-byte opcode to decide which decoder/handler to dispatch the rest of the packet to. For more information see Mirrors.
All the compiler does is check that no opcode is used twice, and add something like class Foo { const int Opcode = 0x12345678; ... }
in the generated code for you to use in your dispatching code.
As in many C-like languages, //
starts a comment until the end of the line, whereas /*
and */
delimit a block comment.
If a comment is placed directly before a field specification (/* like so */ int32 x;
) or before a definition (/* like so */ struct S { ... }
), that comment will be copied over as "documentation" to the corresponding bit of generated code.
A Bebop file may include all the definitions of other files, by listing import
statements at the top of the file.
Such a statement consists of import
, followed by a quoted relative path.
import "../Schemas/jazz.bop"
import "more_stuff.bop"
import "./instruments.bop"
enum Whatever {
...