Add tests for invalid utf8 when formatting #473

smaye81 · 2025-05-28T18:01:09Z

According to docs:

The value is formatted as if string(value)was performed and any invalid UTF-8 sequences are replaced with \ufffd. Multiple adjacent invalid UTF-8 sequences must be replaced with a single \ufffd.

This adds two additional tests to verify:

invalid UTF-8 sequences are each replaced with \ufffd.
multiple adjacent invalid UTF-8 sequences are replaced with a single \ufffd.

google-cla · 2025-05-28T18:01:14Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

The current conformance tests for `string.format` are not exhaustive and do not account for all scenarios in the [docs](https://github.com/google/cel-spec/blob/master/doc/extensions/strings.md). One such example is a test for invalid UTF-8. This adds the ability to specify supplemental conformance tests in the form of another textproto file. The content is merged with the actual cel conformance tests and then run against our implementation. This allows us to specify our own tests not yet covered in the official conformance tests. As a result, this includes two tests for invalid UTF-8, which incidentally turned up a bug involving collapsing placeholders for contiguous invalid UTF-8 bytes. Note that a PR has been created [here](google/cel-spec#473) to add these tests to the spec. Once added and released, they can be removed from our supplemental tests.

The current conformance tests for `string.format` are not exhaustive and do not account for all scenarios in the [docs](https://github.com/google/cel-spec/blob/master/doc/extensions/strings.md). One such example is a test for invalid UTF-8. This adds the ability to specify supplemental conformance tests in the form of another textproto file. The content is merged with the actual cel conformance tests and then run against our implementation. This allows us to specify our own tests not yet covered in the official conformance tests. As a result, this includes two tests for invalid UTF-8, which incidentally turned up a bug involving collapsing placeholders for contiguous invalid UTF-8 bytes. Note that a PR has been created [here](google/cel-spec#473) to add these tests to the spec. Once added and released, they can be removed from our supplemental tests. See See bufbuild/protovalidate-java#294 for a similar PR in protovalidate-java. This also renames some functions to make the test implementation more consistent across protovalidate implementations.

smaye81 added 2 commits May 28, 2025 13:37

Add tests for invalid utf8 when formatting

88eba4d

Fix test

bbf7065

smaye81 mentioned this pull request May 29, 2025

Add supplemental format conformance tests bufbuild/protovalidate-java#294

Merged

smaye81 mentioned this pull request May 29, 2025

Add supplemental format conformance tests bufbuild/protovalidate-python#308

Merged

Escape invalid utf-8 bytes

375fe9d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add tests for invalid utf8 when formatting #473

Add tests for invalid utf8 when formatting #473

smaye81 commented May 28, 2025 •

edited

Loading

Uh oh!

google-cla bot commented May 28, 2025

Uh oh!

Uh oh!

Add tests for invalid utf8 when formatting #473

Are you sure you want to change the base?

Add tests for invalid utf8 when formatting #473

Conversation

smaye81 commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

google-cla bot commented May 28, 2025

Uh oh!

Uh oh!

smaye81 commented May 28, 2025 •

edited

Loading