Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata: Header value validation #11679

Open
bananacocodrilo opened this issue Nov 8, 2024 · 6 comments
Open

Metadata: Header value validation #11679

bananacocodrilo opened this issue Nov 8, 2024 · 6 comments

Comments

@bananacocodrilo
Copy link

bananacocodrilo commented Nov 8, 2024

What version of gRPC-Java are you using?

1.41.x, but also observed in master branch

What is your environment?

Linux/MacOs, n/a

What did you expect to see?

Metadata validation is consistent between Java and Go implementations.

What did you see instead?

Java treats differently some invalid characters.

These are the rules for header values:

  1. ASCII-Value should not have leading or trailing whitespace.
  2. ASCII-Value → 1*( %x20-%x7E ) ; space and printable ASCII
  3. "Implementations must not error due to receiving an invalid ASCII-Value that's a valid field-value in HTTP, but the precise behaviour is not strictly defined: they may throw the value away or accept the value. If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata."

About rule 1 - ASCII-Value should not have leading or trailing whitespace.

We have not yet tested all the cases but it seems like neither Java nor Go are validating for this. Doesn't seem like there is a strong need to implement this.

About rule 2 - ASCII-Value → 1*( %x20-%x7E ) ; space and printable ASCII

The current metadata object in grpc-java relies on encoded.getBytes(US_ASCII); :

byte[] toBytes(T value) {
String encoded = Preconditions.checkNotNull(
marshaller.toAsciiString(value), "null marshaller.toAsciiString()");
return encoded.getBytes(US_ASCII);
}

public <T> void put(Key<T> key, T value) {
Preconditions.checkNotNull(key, "key");
Preconditions.checkNotNull(value, "value");
maybeExpand();
name(size, key.asciiName());
if (key.serializesToStreams()) {
value(size, LazyValue.create(key, value));
} else {
value(size, key.toBytes(value));
}
size++;
}

The effects of this are that depending on the type of invalid character, the headers have different behaviours:

  1. Characters outside of US_ASCII: The invalid character gets replaced by the replacement character (�), and after that substituted again for a question mark (?).
  2. Non printable characters inside of US_ASCII range (e.g: \t or \n) are not substituted. This leads to the header being removed from the request.

In grpc-go there is a straight forward validation of all the bytes before sending the request. If the validation fails the request call results in error.

About rule 3 - If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata.

Java uses the same validation as for request headers:

  • Characters outside of US_ASCII: The application code in the server will see the received header with the character substituted by the replacement character (�). When echoing back the header, the server will send the header with the character substituted by the question mark (?)
  • Non printable characters: The application code in the server will see those characters in the received headers but will drop the header from the response.

Go doesn't validate anything in the received headers nor in the response headers, so you are able to send back any header. You are also able to send any new headers that break all the rules in the response ( eg: "invalid-$":"Ĩñvá \t łið" )

Steps to reproduce the bug

Edit: Fixed misunderstanding from our side.

@kannanjgithub
Copy link
Contributor

To preserve non-ASCII characters in header value we could Unicode encode them before transmission by the gRPC client so é gets transformed to \u0049 in the string representation, gets transmitted and gets decoded back to Unicode on the gRPC server side, and do this in all languages. The question is whether we want to care about supporting non-ASCII characters. @ejona86

@ejona86
Copy link
Member

ejona86 commented Nov 19, 2024

gRPC does not support non-ASCII characters in header values, and I don't think this is asking they do. We have no desire to add non-ASCII support directly. Non-ASCII should be handled by the application using an encoding of the application's choice, e.g., percent encoding. HTTP is very irregular about such encoding, and often times is formatting-specific, e.g., Unicode in JSON can use \u, but only within strings and escapes in general have a lot of rules around them.

E.g.: test-invalid-2:a\ta will pass validation in Java, be sent to the Go server, and then Go will fail the validation when transmitting it either in the response or in requests to downstreams.

Independent of the Java behavior, that is clearly a bug in Go. From the spec:

Implementations must not error due to receiving an invalid ASCII-Value that's a valid field-value in HTTP, but the precise behavior is not strictly defined: they may throw the value away or accept the value. If accepted, care must be taken to make sure that the application is permitted to echo the value back as metadata. For example, if the metadata is provided to the application as a list in a request, the application should not trigger an error by providing that same list as the metadata in the response.

@dfawley
Copy link
Member

dfawley commented Nov 19, 2024

Independent of the Java behavior, that is clearly a bug in Go. From the spec:

That's grpc/grpc-go#5318

We should be dropping the headers on the incoming path, since there's no way to differentiate between headers received from some incoming call and headers originating at the application. Changing that behavior will break people, I'm sure.

@ejona86
Copy link
Member

ejona86 commented Nov 19, 2024

Netty should be preventing \n and control characters from being sent; I think we have header validation enabled. So I think the only character not handled is \t.

CoolTomatos added a commit to CoolTomatos/grpc-java that referenced this issue Nov 19, 2024
…acter the same way grpc-go validates the Metadata.

closes grpc#11679
@bananacocodrilo
Copy link
Author

Hi, thanks for taking the time to look into this.

There were a couple of problem with our test setup so we were getting some erroneous results.
I have updated the original comment to reflect this.

@ejona86

  1. Indeed, we do not think there is any need to support unicode.
  2. You are correct, non printable characters seen to be taken care of here
    /**
    * Returns {@code true} if {@code subject} contains only bytes that are spec-compliant ASCII
    * characters and space.
    */
    private static boolean isSpecCompliantAscii(byte[] subject) {
    for (byte b : subject) {
    if (b < 32 || b > 126) {
    return false;
    }
    }
    return true;
    }

The problems (from our point of view at least) are:

  • Java silently replaces non US ASCII characters by "?" but drops the header if there is a non-printable US ASCII. The behaviour should be the same for all non valid characters.
  • Java will fail to echo back headers that contain non-printable US ASCII.
  • Go only validates when sending request headers, allowing it to introduce any new invalid header in the response. E.g.: "invalid-$$":"Bá \t ð" . (this is probably pretty niche).

I'm unsure of what is the correct path to move from here.

For Java specifically: It should treat all the invalid characters the same way (dropping the header + log message). Go's strategy of failing the request instead of dropping the header makes it easier to debug problems, but introducing that in Java at this point would break people's code so I'm not sure if the trade off is worth it.

About accepting and echoing back invalid headers: The specification states that the implementations are allowed to drop the values. This is also non backwards-compatible. But I'd argue that is worth introducing the change.

  • This only affects systems that need values in header that they themselves aren't able to send.
  • It simplifies moving towards a complete implementation of the specification.

@ejona86
Copy link
Member

ejona86 commented Nov 20, 2024

Looking at the history, it seems converting to ASCII is very old (Jan 2015), before our first release 0.7.0 in May 2015. Before that point we used UTF-8. That same commit introduces the skipping behavior of control characters, for sending. That pre-dates the restriction in the gRPC spec (Nov 2015), although there had been related discussions starting in Jul 2015.

I'm not immediately finding code that filters out headers in the receiving direction. Only ASCII will be provided to AsciiMarshaller, excepting "�" replacements. It is possible to echo back received non-ASCII (because it is just stored as byte[] in Metadata), but TransportFrameUtil will filter those out. That means we could send \t (as Netty likely prevents other control characters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants