-
Notifications
You must be signed in to change notification settings - Fork 468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add UTF-8 support in metric and label names #689
Conversation
Signed-off-by: Federico Torres <[email protected]>
Signed-off-by: Federico Torres <[email protected]>
Sorry, still fighting my backlog. I hope I get to review this tomorrow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for doing this.
A few ideas about how to precisely control whether to escape or not. I hope I got it right, but happy to learn where I'm wrong.
In different news, this needs documentation (which currently is just in the ever growing README.md – feel free to add it there, breaking down the documentation into "proper" documentation files would be a task for another PR).
handler/handler_test.go
Outdated
@@ -18,6 +18,7 @@ import ( | |||
"context" | |||
"errors" | |||
"fmt" | |||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has to go into the next block (line 29).
main.go
Outdated
@@ -17,6 +17,7 @@ import ( | |||
"compress/gzip" | |||
"context" | |||
"fmt" | |||
"github.com/prometheus/common/model" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, has to go to line 39 or so.
main.go
Outdated
@@ -73,6 +74,7 @@ func main() { | |||
persistenceFile = app.Flag("persistence.file", "File to persist metrics. If empty, metrics are only kept in memory.").Default("").String() | |||
persistenceInterval = app.Flag("persistence.interval", "The minimum interval at which to write out the persistence file.").Default("5m").Duration() | |||
pushUnchecked = app.Flag("push.disable-consistency-check", "Do not check consistency of pushed metrics. DANGEROUS.").Default("false").Bool() | |||
pushEscaped = app.Flag("push.enable-escaped-labels", "Allow escaped characters in label names in URLs.").Default("false").Bool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call this push.enable-utf8-names
to mirror the feature flag in Prometheus, and also to correctly describe what we are doing. (Without the flag, the PGW will reject non-legacy characters in names even in the payload of a push.)
Variable name and doc string has to be adjusted accordingly.
handler/push.go
Outdated
@@ -180,19 +180,21 @@ func splitLabels(labels string) (map[string]string, error) { | |||
for i := 0; i < len(components)-1; i += 2 { | |||
name, value := components[i], components[i+1] | |||
trimmedName := strings.TrimSuffix(name, Base64Suffix) | |||
unescapedName := model.UnescapeName(trimmedName, model.ValueEncodingEscaping) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't even want to try to unescape if the corresponding flag is not set. To not refer to the flag value directly here, maybe set a package-level variable of type model.EscapingScheme
to either model.ValueEncodingEscaping
or model.NoEscaping
and then use that variable here.
handler/push.go
Outdated
if !model.LabelNameRE.MatchString(trimmedName) || | ||
!model.LabelName(unescapedName).IsValid() || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this validity test, wo don't need the one in the previous line anymore, do wo?
ab2490d
to
83c45eb
Compare
Signed-off-by: Federico Torres <[email protected]>
@beorn7 Thanks for your feedback, the requested changes were addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code looks good (modulu doc comment on push.EscapingScheme
. But we also need documentation. (As suggested previously, I believe it is OK for now to add a section to the ever growing README.md, keeping in mind that eventually we would like to break it up into "proper" documentation.)
handler/push.go
Outdated
@@ -41,6 +41,8 @@ const ( | |||
Base64Suffix = "@base64" | |||
) | |||
|
|||
var EscapingScheme = model.NoEscaping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could benefit from a doc comment.
Signed-off-by: Federico Torres <[email protected]>
@beorn7 Sorry about the documentation, thanks for pointing it out, now it's added. Also added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. I just couldn't resist and applied a lot of wordsmithing. Please see my suggestions.
README.md
Outdated
@@ -320,6 +320,28 @@ Examples: | |||
|
|||
/metrics/job/titan/name@base64/zqDPgc6_zrzOt864zrXPjc-C | |||
|
|||
### UTF-8 support |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be a bit more specific here and call this "UTF-8 support for metric and label names".
(UTF-8 is supported for label values already.)
README.md
Outdated
UTF-8 characters in metric and label names are supported by enabling | ||
the `--push.enable-utf8-names` flag. The syntax is as follows: | ||
|
||
{"some.metric", "foo.bar"="baz"} | ||
|
||
For UTF-8 label names defined in the URL path, [an escaping syntax](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) is used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not explain the syntax of the text exposition format here. (It also doesn't cover protobuf.)
With some wordsmithing included, I would propose the following:
UTF-8 characters in metric and label names are supported by enabling | |
the `--push.enable-utf8-names` flag. The syntax is as follows: | |
{"some.metric", "foo.bar"="baz"} | |
For UTF-8 label names defined in the URL path, [an escaping syntax](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping) is used. | |
Newer versions of the Prometheus exposition formats (text and protobuf) | |
support the full UTF-8 character set in metric and label names. The | |
Pushgateway only accepts special characters in names if the command line | |
flag `--push.enable-utf8-names` is set. | |
To allow special characters in label names that are part of the URL path, the flag also enables a | |
[specific encoding mechanism](https://github.com/prometheus/proposals/blob/main/proposals/2023-08-21-utf8.md#text-escaping). This is similar to the base64 encoding for label _values_ described above, | |
but works differently in detail for technical and historical reasons. As before, client libraries | |
should usually take care of the encoding. It works as follows: |
README.md
Outdated
* Prefix the string with `U__`. | ||
* All non-valid characters (i.e. characters other than letters, numbers, and underscores) will be encoded as underscores surrounding the Unicode value, like `_1F60A_`. | ||
* All pre-existing underscores will become doubled: `__`. | ||
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stay consistent with regard to the form. You start with an imperative form and then use future tense.
Here my suggestion with a few things added:
* Prefix the string with `U__`. | |
* All non-valid characters (i.e. characters other than letters, numbers, and underscores) will be encoded as underscores surrounding the Unicode value, like `_1F60A_`. | |
* All pre-existing underscores will become doubled: `__`. | |
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | |
* A label name containing encoded characters starts with `U__`. | |
* All non-standard characters (i.e. characters other than letters, numbers, and underscores) are encoded as underscores surrounding their Unicode value, like `_1F60A_`. | |
* All pre-existing underscores are encoded as a double-underscore: `__`. | |
* If a label name starts with `U__` already, these characters have to be encoded as well, resulting in `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | |
* A label name starting with `U__` in it's encoded form, but containing invalid sequences (e.g. `U__in_xxx_valid`) is left unchanged. |
README.md
Outdated
* All pre-existing underscores will become doubled: `__`. | ||
* If a string should start with "U__" already, it will need to be escaped: `U___55_____`. (That's `U__` + `_55_` (for `U`) + `__` + `__`). | ||
|
||
For example, the label `"foo.bar"="baz"` would be escaped like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's better use "encoded" rather than "escaped" to only use one word throughout this file.
README.md
Outdated
|
||
/metrics/job/example/U__foo_2e_bar/baz | ||
|
||
This escaping is compatible with the base64 encoding for label values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
escaping → encoding, see above.
This escaping is compatible with the base64 encoding for label values: | ||
|
||
/metrics/job/example/U__foo_2e_bar@base64/YmF6 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add something about the one edge case, e.g.
Note that this method has an unlikely edge case that is not handled properly: A pusher unaware of the encoding mechanism might use a label name that is also a valid encoded version of another label name. E.g. if a pusher intends to use the label name U__foo_2e_bar
, but doesn't encode it as U___55_____foo__2e__bar
, the Pushgateway will decode U__foo_2e_bar
to foo.bar
. This is the main reason why the decoding is opt-in via the --push.enable-utf8-names
flag.
Signed-off-by: Federico Torres <[email protected]>
@beorn7 Thanks for the docs suggestions, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢
As part of the initiative for adding UTF-8 support to Prometheus, this PR brings UTF-8 support for metric and label names as described in this proposal (for example:
{"metric.name", "label.name"="value"}
.For UTF-8 label names defined in the URL path, an escaping syntax is used. For example, the label
"label.name"="value"
would be escaped like:/metrics/job/example/U__label_2e_name/value
. This escaping is compatible with the base64 encoding for label values:/metrics/job/example/U__label_2e_name@base64/dmFsdWU=
UTF-8 support can be enabled by running the Pushgateway with the flag
--push.enable-utf8-names=true
Addresses #623