-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agent-api changes for Dekaf #1793
Conversation
c19b6c2
to
01eaf34
Compare
…equest. Specifically, it's reasonable to request a token with both LIST and READ, but the existing logic only supported getting one or the other.
…access tokens Only validate that the token is signed by the data-plane in question, and that the task is running in the issuing data-plane. It's not meaningful to further validate access to a particular collection. In Dekaf's case, that check would defeat the purpose as this control-plane access token is needed in order to fetch the collections in the first place.
01eaf34
to
31b4243
Compare
crates/models/src/authorizations.rs
Outdated
@@ -148,3 +154,29 @@ pub struct UserTaskAuthorization { | |||
#[serde(default, skip_serializing_if = "String::is_empty")] | |||
pub shard_id_prefix: String, | |||
} | |||
|
|||
#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)] | |||
pub enum AllowedRole { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, role is restricted through deserialization.
This feels a bit weird to me, because I cannot conceive of a likely scenario where we would want to extend this, and thus don't need this generalization.
Stepping back for a moment to look at this more holistically: the runtime gets a bunch of metadata through Etcd-stored ShardSpecs, like the build ID to use which unambiguously resolves / allows for loading the task spec, and also the name of the ops logs & stats partition to use, which are encoded as shard spec labels, plus some other stuff.
Dekaf tasks don't have shards, we need an alternative for this discovery, so a new API is warranted. Somehow we need to:
- Give Dekaf a JWT which allows it to work with the AVRO schemas table
- Let Dekaf know the exact materialization spec to use.
- Tell Dekaf what the ops logs / stats partitions are to use.
- Probably other stuff we may not have thought of.
I think my central feedback is that the /authorize/role
abstraction is painting us in a corner, and instead we want an /authorize/dekaf
which embraces the differences between how Dekaf bootstraps (without shards) and how the runtime does.
It could directly load / return the current MaterializationSpec, for example, which removes the RLS question for live_specs from the dekaf
role JWT and allows us to narrow it's access. It could return the build ID and other metadata for associated logging within Dekaf. It can tell dekaf of the ops logs / stats partitions to use, which only needs to be resolved once and not for every task binding (which is part of why it's odd for it to be included in /authorize/task
). It would be less gross to extend if we discover other bits that Dekaf can benefit from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I went back and forth on this and ended up going with the more general /authorize/role
because /authorize/dekaf
felt like a deviation from the level of generality that everything surrounding it exists in. That is, the rest of the agent API is fairly general and applicable to the system as a whole (at least the /authorize/*
endpoints are), whereas /authorize/dekaf
would be highly specific to Dekaf and nothing else.
Of course, going down the less general/more specific path has the very nice benefit of not having to worry about how to fit the things you need the API to return into the abstraction, but aren't you worried about this moving the agent API towards becoming a sort of "kitchen sink" where we add new endpoints whenever we need a new one, as opposed to presenting a more coherent interface?
I'm certainly happy to take the /authorize/dekaf
path here, as like you say it'll enable returning a bunch more related data in one go, and also reduce the security-relevant surface area (as the JWT will only need access to the AVRO schemas, not all specs).
fd4a103
to
1ed5461
Compare
3fd2d16
to
d5ab497
Compare
In addition to avoiding premature generalization, we can now also return relevant metadata in one request, such as ops catalog names and task spec.
d5ab497
to
52e4e71
Compare
Okay @jgraettinger the changes to make this into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep LGTM too, thanks
This got left out of #1793 and is needed for Dekaf materialization endpoints
This got left out of #1793 and is needed for Dekaf materialization endpoints
Description:
This was originally part of #1791 but I pulled it out in order to get it merged sooner, as the changes to support Dekaf materializations require these changes.
/authorize/dekaf
, an endpoint designed to give Dekaf all of the pieces it needs to serve requests. These are:A short-lived control-plane access token to authorize requests under the
dekaf
role which has grants to thepublic.registered_avro_schemas
table. This works by taking advantage of Postgrest's JWT auth, where a properly signed JWT with arole
claim will cause all API requests using that token to execute under the corresponding Postgres role.The
dekaf
role is defined as:The
models::MaterializationDef
for the materialization being requested as identified by thesub
JWT claim. This will both allow Dekaf to validate that the request being made has the proper bearer token, and will provide information about which topics (bindings) to expose.The ops logs and stats journal names for the materialization. This will allow Dekaf to write ops logs and stats for operations as well as billing.
/authorize/task
to support requesting a token containing bothREAD
andLIST
capabilitiesflow-client
to :fetch_task_authorization()
(which actually calls/authorize/user/task
) tofetch_user_task_authorization()
fetch_task_authorization()
which calls/authorize/task
This change is