GRPC Demo #798

corps · 2024-06-14T07:34:34Z

A fully working gRPC demo. Working on a notion doc explaining all the parts, open questions, how to improve expand on, etc.

Basic idea is that this is a minimal working example of

A python package that automatically builds mypy typed grpc stubs from proto files.
Also uses a custom codegen tool that generates types adaptors (demonstrating how writing a custom codegen plugin works, and how adaptors can help us migrate existing endpoints)
A working http 1.1 transport implementation in a basic python app
A working envoy proxy configuration backing that
Mutual TLS authentication backing the gRPC server, along with ability to inspect the identity of the caller (And thus enforce, for instance, ACLs)

https://www.notion.so/sentry/gRPC-Implementation-For-Seer-cb14fb94e90d4afaa4d80c664e0c9031?showMoveTo=true&saveParent=true

corps · 2024-06-14T07:35:06Z

.github/workflows/test-app.yml

@@ -40,14 +40,6 @@ jobs:
      - name: Build image
        run: |
          make update
-      # TODO: Re-enable this when json schemas are ready to come back


Just removing some cruft, not relevant

corps · 2024-06-14T07:35:33Z

Dockerfile

@@ -1,4 +1,9 @@
+FROM golang:1.20.0 as go-build
+RUN mkdir go && cd go && export GOPATH=/go && \


Tool that helps setup a mTLS key chain.

With mTLS requiring more infrastructure and cert management, we should also consider having a solution for bearer token based authentication with support for no-downtime token rotation.

But here's the thing -- token based authentication will, I feel, ultimately be even more infrastructure and management.
Once we start having multiple service to service sites, and once we may want things like ACLs (for instance, seer shouldn't have access to hybrid cloud endpoints), you need identity management, which token doesn't have baked in. We'd have to /build/ all that out, and we'd /still/ end up with infra tooling that allowed no downtime token rotations in the face of that identity management. Which we have to implement in each place.

With mTLS, we have one workflow, with tools that already exist, that is industry standard.

We can use JWTs to bake in identities and carry end-user context into downstream services. We could really get a lot of benefit with the added context in the multi-tenant environment and any time we cross some sort of domain trust boundary.

mTLS alone does not lend itself well to nuanced authorization decisions, at least in my opinion. A combination of mTLS for service-to-service authentication (PeerAuthentication and using JWT to carry end-user context (RequestAuthentication) is more robust than just one or the other. It's a fairly common pattern from my research. AuthorizationPolicies can be used to introspect the JWT's claims and make authorization decisions at the service mesh level.

In terms of added infrastructure, we'd be looking at some sort of trusted Secure Token Service (STS) to support the minting of JWTs. Tokens are issued through standard OAuth flows, and services would just implement JWT verification (if we decided to not just let the service mesh do this) and logic to pass it on where necessary.

Of course, implementing one thing at a time is probably best. 🙂

corps · 2024-06-14T07:36:17Z

Makefile

+PROTOS_OUT=protos/build
+PROTOS_SOURCE=protos/src
+.PHONY: protos
+protos: ## Regenerate protobuf python files


In this workflow, all it takes is installing (pip install) to get both the protos and the generated files together, making it relatively simple to both deploy and use in development.

corps · 2024-06-14T07:36:44Z

README.md

@@ -1,5 +1,16 @@
 # seer

+## gRPC experiment


TODO: Will remove, I don't intend to merge this into master. I have some ideas how we can break this out into a workable workflow and separate concerns.

corps · 2024-06-14T07:38:45Z

envoy.yaml

@@ -0,0 +1,70 @@
+static_resources:


I'm using envoy here because I know it has decent GKE support (https://cloud.google.com/kubernetes-engine/docs/tutorials/exposing-grpc-services-on-gke-using-envoy-proxy)

But to be honest, it wouldn't have to be GKE if we didn't want it to. Here, it handles the mTLS and the http 2 -> 1.1 proxying, but I'm confident both are also configurable in k8s with many other filters. That said, envoy is pretty nice and contains a lot of nice advantages, including decent XDS configuration. For service to service at scale, I feel inclined to bring this up.

We have envoy in a few places in our infrastructure already. Using envoy for grpc service discovery and mtls termination should be ok. If we can avoid downgrading to HTTP1.1 I think it would be worth the extra effort.

corps · 2024-06-14T07:39:23Z

protobuf-adaptors/src/protobuf_adaptors/main.py

@@ -0,0 +1,237 @@
+import contextlib


This is a minimal, demo of creating a custom codegen tool. I didn't go out of my way to make it pretty, it could definitely be made nicer if we committed to such approaches.

corps · 2024-06-14T07:40:02Z

protos/py/sentry_services/seer/severity_adaptors.py

@@ -0,0 +1,115 @@
+"""


NOTE: I do not actually indent to commit any of the generated files (everyting in protos/py) in practice, but I did so in this demo for demonstrative purpose of the output.

corps · 2024-06-14T07:40:44Z

src/seer/app.py

+        self, proto_request: ScoreRequest, context: grpc.ServicerContext
+    ) -> ScoreResponse:
+        identities = context.peer_identities()
+        if not identities or b"consumer" not in identities:


Just a simple demo of how authorization can use peer information from mTLS.

Having to do authentication checks in each RPC method will be error prone. gRPC has interceptors which seem like a good fit for authentication methods.

corps · 2024-06-14T07:42:47Z

src/seer/app.py

+            context.abort(grpc.StatusCode.PERMISSION_DENIED, "only consumer can access")
+
+        request = SeverityRequest()
+        request.adapt_from(proto_request)


This is an example of a realistic rollout case: we have an existing endpoint with production clients we want to keep working, that uses pydantic classes. We create and use a generated adaptor to convert between protos and the pydantic classes during the rollout so we can incrementally change the client side to use protos natively. After that, the implementation itself could use the protos natively and possibly discard the json endpoints if they are not necessary.

It is also possible to use native proto <-> json format tools, but there are some caveats we should discuss. I think in either case these adaptors are safe tools for rolling out to existing endpoints.

corps · 2024-06-14T07:44:41Z

src/seer/grpc_helpers.py

+
+from seer.severity.severity_inference import SeverityRequest
+
+


In this case, I've implemented a custom flask transport for grpc. Because we are using http 2 -> http1.1 bridge, we end up needing to implement some grpc server details here to support that. To use the native gRPC server, we'd have to support http 2, and we'd likely have to run separate servers in situations where we are deploying it alongside an existing django or flask app.

This compromise demonstrates that if we really want to commit one server for both gRPC and our existing http 1.1 servers, it is possible, with a modest amount of leg work.

I think in production we would want separate deployments for RPC and web traffic. However, in local development having a single server would be nice to have.

corps · 2024-06-14T07:45:30Z

src/seer/grpc_helpers.py

+    # response.raise_for_status()
+    # print(response.json())
+
+    with grpc.secure_channel(


Fortunately, grpc clients will still natively use http 2 with grpc and do not need to be aware of how we implement the http 1.1 bridge.

Note: it's also possible to use envoy as a front end as well, meaning that application codes would not need their own credentials files. In this sort of deployment, the envoy sidecar container contains the actual client tls configuration and the application code makes an insecure channel directly to the sidecar, which handles mtls on behalf.

corps · 2024-06-14T07:45:49Z

src/seer/severity/severity_inference.py

-            pred = self.classifier.predict_proba(input_data)[0][1]
-
-        return SeverityResponse(severity=round(min(1.0, max(0.0, pred)), 2))
+        return SeverityResponse(severity=0.6)


Just to make the demo simpler, disabling the actual inference.

mikejihbe · 2024-06-14T15:48:51Z

src/seer/app.py

+        if not identities or b"consumer" not in identities:
+            context.abort(grpc.StatusCode.PERMISSION_DENIED, "only consumer can access")
+
+        request = SeverityRequest()


Seems like you could almost make this a middleware or passthrough annotation that handles conversion for you if necessary

markstory · 2024-06-14T20:58:18Z

Dockerfile

@@ -1,4 +1,9 @@
+FROM golang:1.20.0 as go-build
+RUN mkdir go && cd go && export GOPATH=/go && \


With mTLS requiring more infrastructure and cert management, we should also consider having a solution for bearer token based authentication with support for no-downtime token rotation.

markstory · 2024-06-14T21:00:51Z

envoy.yaml

@@ -0,0 +1,70 @@
+static_resources:


We have envoy in a few places in our infrastructure already. Using envoy for grpc service discovery and mtls termination should be ok. If we can avoid downgrading to HTTP1.1 I think it would be worth the extra effort.

markstory · 2024-06-14T21:11:30Z

src/seer/app.py

+        self, proto_request: ScoreRequest, context: grpc.ServicerContext
+    ) -> ScoreResponse:
+        identities = context.peer_identities()
+        if not identities or b"consumer" not in identities:


Having to do authentication checks in each RPC method will be error prone. gRPC has interceptors which seem like a good fit for authentication methods.

markstory · 2024-06-14T21:14:32Z

src/seer/grpc_helpers.py

+
+from seer.severity.severity_inference import SeverityRequest
+
+


I think in production we would want separate deployments for RPC and web traffic. However, in local development having a single server would be nice to have.

markstory · 2024-06-14T21:25:16Z

protos/py/sentry_services/seer/severity_adaptors.py

@@ -0,0 +1,115 @@
+"""
+@generated by mypy-protobuf.  Do not edit manually!


Aren't these classes generated by your code?

markstory · 2024-06-14T21:33:30Z

protos/py/sentry_services/seer/severity_adaptors.py

+        self.apply_to_message(proto)
+        self.apply_to_has_stacktrace(proto)
+        self.apply_to_handled(proto)
+        self.apply_to_trigger_timeout(proto)
+        self.apply_to_trigger_error(proto)


Should these method calls also be passing the val parameter?

markstory · 2024-06-14T21:34:15Z

protos/py/sentry_services/seer/severity_adaptors.py

+class FromScoreRequestAdaptor:
+    def adapt_from_message(self, value: builtins.str):
+        pass
+
+    def adapt_from_has_stacktrace(self, value: builtins.int):
+        pass
+
+    def adapt_from_handled(self, value: builtins.bool):
+        pass
+
+    def adapt_from_trigger_timeout(self, value: builtins.bool):
+        pass
+
+    def adapt_from_trigger_error(self, value: builtins.bool):
+        pass
+
+    def adapt_from(self, proto: sentry_services.seer.severity_pb2.ScoreRequest):
+        self.adapt_from_message(proto.message)
+        self.adapt_from_has_stacktrace(proto.has_stacktrace)
+        if proto.HasField("handled"):
+            self.adapt_from_handled(proto.handled)
+        self.adapt_from_trigger_timeout(proto.trigger_timeout)
+        self.adapt_from_trigger_error(proto.trigger_error)


Did you consider generating 'mapping functions' that accept a target object, and proto request, and return an updated target object? Currently we'll be adding many methods to domain models. A mapping function/builder wouldn't add methods to existing objects and should have similar type safety.

🤷 sure, I mean, there's tons of things we can do. my point with the demo here is merely to demonstrate that working solutions can be derived from a. 100 line script that make the domain <-> wire transformation easier. When we get to a point of actual implementation, I figure it's something that we iterate on more closely before we declare it standardized.

there's tons of things we can do. my point with the demo here is merely to demonstrate that working solutions can be derived from a. 100 line script that make the domain <-> wire transformation easier. When we get to a point of actual implementation, I figure it's something that we iterate on more closely before we declare it standardized.

That's fair. I wasn't sure if this was the design you were planning to move forward with.

I will say this. There are 2 main gotchas we have to consider that drove my current design wether I stick to it:

It is super important that adding new fields to protos and generating a new adaptor cannot break existing code. Because I imagine protos being generated in repos separately from their consumers, it means there can be version drift by default, which is actually good. But this also means the default generated code cannot presume any behavior if and when domain model and proto versions drift (either can be updated before the other). In this case, the adaptor's default behavior is a no op unless specifically wired.

It won't always be the case that straight assignment works, in the case that types need further, deeper transformation, so a simple assignment sometimes won't be enough. That's probably fine, as the domain model can write setters to receive attributes, and I suspect that is still the exception and not the rule.

GRPC Demo

efa31b7

corps requested review from mitsuhiko, markstory, mikejihbe, trillville and mdtro June 14, 2024 07:34

corps commented Jun 14, 2024

View reviewed changes

mikejihbe reviewed Jun 14, 2024

View reviewed changes

markstory reviewed Jun 14, 2024

View reviewed changes

corps mentioned this pull request Jun 17, 2024

First pass converting seer api calls to using signature getsentry/sentry#72486

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPC Demo #798

GRPC Demo #798

corps commented Jun 14, 2024 •

edited

Loading

corps Jun 14, 2024

corps Jun 14, 2024

markstory Jun 14, 2024

corps Jun 15, 2024

mdtro Jun 19, 2024

corps Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

markstory Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

markstory Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

markstory Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

corps Jun 14, 2024

mikejihbe Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

markstory Jun 14, 2024

corps Jun 18, 2024

markstory Jun 20, 2024

corps Jun 20, 2024

		@@ -1,4 +1,9 @@
		FROM golang:1.20.0 as go-build
		RUN mkdir go && cd go && export GOPATH=/go && \


		from seer.severity.severity_inference import SeverityRequest

		@@ -0,0 +1,115 @@
		"""
		@generated by mypy-protobuf. Do not edit manually!

GRPC Demo #798

Are you sure you want to change the base?

GRPC Demo #798

Conversation

corps commented Jun 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

corps commented Jun 14, 2024 •

edited

Loading