-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PEERDB_CLICKHOUSE_BINARY_FORMAT #2407
Conversation
)) | ||
} | ||
case peerdbenv.BinaryFormatHex: | ||
projection.WriteString(fmt.Sprintf("hex(base64Decode(JSONExtract(_peerdb_data, '%s', '%s'))) AS `%s`,", colName, clickHouseType, dstColName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be ideal to send hex to raw table in this case, but:
- that would mean passing down ctx/env somehow to json serialization
- things would get weird with raw tables being invalidated when changing PEERDB_BINARY_FORMAT
- base64 is actually a pretty good binary encoding when raw binary isn't available (relatively dense while being computationally straightforward compared to base85, generally available decoding functions on destination)
dd1a889
to
d175ab7
Compare
{ | ||
Name: "PEERDB_BINARY_FORMAT", | ||
Description: "Binary field encoding; either raw, hex, or base64", | ||
DefaultValue: "base64", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change to raw as a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by time we change default to raw we can probably remove this option
case peerdbenv.BinaryFormatHex: | ||
encoded = strings.ToUpper(hex.EncodeToString(byteData)) | ||
default: | ||
panic(fmt.Sprintf("unhandled binary format: %d", format)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a panic can cause issues with maintenance mode (relies on flow-worker being active to pause mirrors)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panic can never happen unless a logic error is introduced in code, & bad code can always panic anyways
In #2181 we consolidated on transmitting bytea columns as base64 strings ClickHouse supports binary data in strings. They need not be valid utf8 This adds PEERDB_BINARY_FORMAT with 3 formats: raw, hex, & base64 Default to base64 to avoid breaking existing setups Also add caching logic for non-immediate dynamic settings. Helps here where we'd hit catalog for every bytea field processed
In #2181 we consolidated on transmitting bytea columns as base64 strings
ClickHouse supports binary data in strings. They need not be valid utf8
This adds PEERDB_CLICKHOUSE_BINARY_FORMAT with 3 formats: raw, hex, & base64
Default to base64 to avoid breaking existing setups
Also add caching logic for non-immediate dynamic settings. Helps here where we'd hit catalog for every bytea field processed
Included
hex
to demonstrate how other formats would fit in, but expect this feature to be removed once we move everyone toraw