Skip to content

Writing Bops: The Bebop Schema Language

Matthew Conover edited this page Jan 4, 2022 · 15 revisions

Bebop schemas are written in the custom Bebop Schema Language, which this page documents.

Definition syntax

A Bebop schema consists of a series of definitions, each introduced by a keyword. Here is an example schema, demonstrating the kinds of definition Bebop accepts:

const int32 PianoKeys = 88;
const guid ImportantProductID = "a3628ec7-28d4-4546-ad4a-f6ebf5375c96";

enum Instrument {
    Sax = 0;
    Trumpet = 1;
    Clarinet = 2;
}

struct Performer {
    string name;
    Instrument plays;
}

message Song {
    1 -> string title;
    2 -> uint16 year;
    3 -> Performer[] performers;
}

union Album {
    1 -> struct StudioAlbum {
        Song[] tracks;
    }
    2 -> message LiveAlbum {
        1 -> Song[] tracks;
        2 -> string venueName;
        3 -> date concertDate;
    }
}

Let's go over each of these:

Const

A const definition defines a single typed value. You can't refer to these values in the rest of the Bebop schema, but they come in handy when the parties using your schemas also need to agree on certain application-wide parameters.

Currently, valid const types are: boolean, integers, floats (including inf, -inf, nan), strings, and GUIDs.

Enum

An enum defines a type that acts as a wrapper around an integer type (defaults to uint32), with certain named constants, each having a corresponding underlying integer value. It is used much like an enum in C.

The syntax is: enum Flavor: uint8 { Vanilla = 1; Chocolate = 2; Mint = 3; }.

  • Unlike in C, all constants must be explicitly given an integer literal value.

  • You should never remove a constant from an enum definition. Instead, put [deprecated("reason here")] in front of the name.

  • You're free to add new constants to an enum at any point in the future.

Flags enum

By default, a Bebop enum type is not supposed to represent any underlying values outside of the ones listed.

If you want a more C-like, bitflags-like behavior, add a [flags] attribute before the enum:

[flags]
enum Permissions {
    Read = 0x01;
    Write = 0x02;
    Comment = 0x04;
}

Defined this way, Permissions values like 0 (no permissions) and 3 (Read + Write) are valid too.

Struct

A struct defines an aggregation of "fields", containing typed values in a fixed order. All values are always present. It is used much like a struct in C.

The syntax is: struct Point { int32 x; int32 y; }.

  • The binary representation of a struct is simply that of all field values in order.
    This means it's more compact and efficient than message.

  • When you define a struct, you're promising to never add or remove fields from it.
    (If this turns out to be necessary, you'll have to define a struct MyStructV2 and deprecate the old struct MyStruct.)

  • When you define a struct with the readonly modifier the Bebop compiler guarantees that it's values cannot be modified or updated after decoding takes place. Use this to ensure data integrity when marshalling between language domains.

Message

A message defines an indexed aggregation of fields containing typed values, each of which may be absent. It might correspond to something like a class in Java, or a JSON object.

The syntax is: message Song { 1 -> string title; 2 -> uint16 year; } — note the indices before each field.

  • In the binary representation of a message, the message is prefixed with its length, and each field is prefixed with its index.

  • It's okay to add fields to a message with new indices later — in fact, this is the whole point of message. (When an unrecognized field index is encountered in the process of decoding a message, it is skipped over. This allows for compatibility with versions of your app that use an older version of the schema.)

Union

A union defines a tagged union of one or more inline struct or message definitions. Each is preceded by a "discriminator" or "tag" value. This defines a type whose values may assume any one of the aggregate layouts defined inside. It corresponds to something like C++'s std::variant.

The syntax is: union U { 1 -> message A { ... }; 2 -> struct B { ... } }.

  • The binary representation of a U value is then: a length prefix, followed by either (a) a 01 byte followed by an encoding of an A message, or (b) a 02 byte followed by an encoding of a B struct.

  • Just like with messages, new branches may be added to a union later. When an unrecognized discriminator value is encountered, the length prefix is used to skip over the body, and decoding fails in a way your program may catch.

  • Nested types are not available globally but do reserve the identifier globally. E.g. in the above you cannot define struct Other { A x; } because A is private to U but you also cannot define struct A { ... } because A is reserved globally.

Note: Unions are currently not yet supported when targeting the Dart programming language.

Notes

When talking about Bebop, the word "record" is used to mean "either a struct, message, or union".

The word "aggregate" is used to mean "either a struct or a message" — as these are direct aggregations of fields (data).

In an aggregate definition, each field is specified by giving the name of the type of the field, followed by the name of the field, followed by ;.

Types

The following types are built-ins:

Name Description
bool A Boolean value, true or false.
byte An unsigned 8-bit integer. uint8 is an alias.
uint16 An unsigned 16-bit integer.
int16 A signed 16-bit integer.
uint32 An unsigned 32-bit integer.
int32 A signed 32-bit integer.
uint64 An unsigned 64-bit integer.
int64 A signed 64-bit integer.
float32 A 32-bit IEEE single-precision floating point number.
float64 A 64-bit IEEE double-precision floating point number.
string A length-prefixed UTF-8-encoded string.
guid A GUID.
date A UTC date / timestamp.
T[] A length-prefixed array of T values. array[T] is an alias.
map[T1, T2] A map, as a length-prefixed array of (T1, T2) association pairs.

You may also use user-defined types (enums and other records) as field types.

A string is stored as a length-prefixed array of bytes. All length-prefixes are 32-bit unsigned integers, which means the maximum number of bytes in a string, or entries in an array or map, is about 4 billion (2^32).

A guid is stored as 16 bytes, in Guid.ToByteArray order.

A date is stored as a 64-bit integer amount of “ticks” since 00:00:00 UTC on January 1 of year 1 A.D. in the Gregorian calendar, where a “tick” is 100 nanoseconds.

Attributes

The “deprecated” attribute

Use [deprecated("We no longer use this")] before a field. When encoding a message deprecated fields are skipped. A notice will also be copied into the generated code.

Opcodes

Use [opcode(0x12345678)] before a record definition to associate an identifying "opcode" with it. You can also use a 4-byte ASCII string as an opcode: [opcode("Ping")].

Strictly speaking, Bebop is not opinionated about what you do with these opcodes. But you may find it useful to send this kind of thing over the wire:

12 34 56 78     03 00 00 00 18 00 ...
[4-byte opcode] [Bebop-encoded data]

And use the 4-byte opcode to decide which decoder/handler to dispatch the rest of the packet to. For more information see Mirrors.

All the compiler does is check that no opcode is used twice, and add something like class Foo { const int Opcode = 0x12345678; ... } in the generated code for you to use in your dispatching code.

Comments

As in many C-like languages, // starts a comment until the end of the line, whereas /* and */ delimit a block comment.

If a comment is placed directly before a field specification (/* like so */ int32 x;) or before a definition (/* like so */ struct S { ... }), that comment will be copied over as "documentation" to the corresponding bit of generated code.

Imports

A Bebop file may include all the definitions of other files, by listing import statements at the top of the file.

Such a statement consists of import, followed by a quoted relative path.

import "../Schemas/jazz.bop"
import "more_stuff.bop"
import "./instruments.bop"

enum Whatever {
    ...