Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EOEPCA/IAM] Evaluate Delegated Access Approaches #84

Open
w-scho opened this issue Feb 11, 2025 · 5 comments
Open

[EOEPCA/IAM] Evaluate Delegated Access Approaches #84

w-scho opened this issue Feb 11, 2025 · 5 comments
Assignees

Comments

@w-scho
Copy link
Collaborator

w-scho commented Feb 11, 2025

Delegated access is needed in several scenarios in EOEPCA, typically in conjunction with long-lived processes.

For example, a user may issue a processing request to the Processing BB. The Processing BB initially queues the request for a while, waiting for resources to become available. It then executes the processing job, which may also take a while (hours, days, ...). Finally the Processing BB delegates to the Workspace BB in order to store the processing result in the user's workspace.

Assumptions and Challenges:

  • Both processing and storing the result are done using the requesting user's identity. So there is no delegation of rights from one user to another or to a dedicated Processing BB account. So there does not seem to be a need to use UMA.
  • While the processing job is waiting in the queue, no user session should be actively kept open.
  • Processing may take a while. The processing job should be able to open a user session (or create access tokens) as needed, but it should not be required to keep it open all the time.
  • During processing, the Processing BB may have to access further services or resources on the user's behalf.
  • The access to the Workspace BB is assumed to be short-lived and to fit into a single user session or access token lifetime. So the Workspace BB should have no need to manage user sessions itself in this scenario.
@w-scho w-scho self-assigned this Feb 11, 2025
@w-scho w-scho added this to the Q4 - Release 2.0 milestone Feb 11, 2025
@w-scho
Copy link
Collaborator Author

w-scho commented Feb 12, 2025

A possible way to go could be the use of offline tokens. Some thoughts about them:

Properties of offline tokens:

  • Offline tokens are designed to be long-lived and to be persisted by the client that obtains them.
  • Offline tokens should remain valid after a restart of Keycloak (tbc)
  • Offline tokens can be revoked via the Admin console and (if configured) via the Account console
  • By default, offline tokens have an unlimited lifetime, but are revoked automatically if not used for 30 days.
  • Otherwise offline tokens behave like ordinary refresh tokens.

Limitations:

  • Offlline tokens can only be obtained during the authentication process. However, this does not necessarily mean that user interaction is required.
  • Like refresh tokens, offline tokens are bound to the client that requests them and cannot be used by other clients.
  • It seems to be possible to revoke offline tokens via the revoke token endpoint. However, according to calling Revoke token on an offline refresh token is invalidating all offline sessions keycloak/keycloak#26532 (at least in Keycloak 21) this always seems to revoke all offline tokens of the user including those obtained by other clients. Not verified for Keycloak 24 yet.
  • Also according to the ticket mentioned above, an explicit logout via the logout endpoint may also revoke all offline tokens or at least those associated with the affected session. However, the ticket refers to Keycloak 21, and (fortunately) I was not able to reproduce this with Keycloak 24.
  • Offline tokens have not been designed for delegated access. Only the client that obtained the offline token is able to use it to obtain access tokens (JWT). In order for delegation to work, it must be made sure that the scope of the access token is wide enough. E.g., its audience should include the whole realm or at least all services that may have to be called. Furthermore it should be made sure that only the service that obtained the offline token actually needs to bridge larger time gaps and that all authorization required by each of its individual subactivities can be performed within the lifetime of an access token.

Recommendations:

  • Offline tokens should only be obtained if required. A client should hold at most one offline token per user at a time.
  • Offline tokens should be kept secret and never be disclosed.
  • Offline tokens should only be used by private (confidential) clients.
  • If possible (tbc), a client should revoke its offline token if it is no longer needed, instead of just dropping it.
  • Token rotation can be used to mitigate token leakage at the cost of some administrative overhead.

@w-scho
Copy link
Collaborator Author

w-scho commented Feb 12, 2025

Further considerations and evaluation results regarding offline tokens:

  • The openid-connect plugin of APISIX obtains an offline token instead of a refresh token if the scope offline_access is contained in the scope parameter.
  • The openid-connect plugin passes the offline token to the backend in the X-Refresh-Token header if the set_refresh_token_header parameter is set to true. This can be observed on https://iam-test3.apx.develop.eoepca.org/ The offline token contained in the X-Refresh-Token header can be passed to the token endpoint as a refresh token in order to obtain an access token.
  • A new offline token is generated upon each login. Therefore it is a bad idea to simply add the offline_access scope globally for a complete service. Instead, a dedicated endpoint with a separate subroute should be defined for this, and only this endpoint should request the offline_access scope. This endpoint should only be accessed when a new offline token must be obtained. Note that the endpoint may still allow an attacker to create lots of useless offline tokens.
  • If an access token is obtained through an offline token, this does not constitute a session. Thus no session can be found in the Keycloak Admin UI in this case. Only the offline token itself is shown as a session.
  • No separate refresh token is generated when obtaining an access token through an offline token. Instead, the offline token itself (same content, but maybe with a new signature) is returned as the refresh token. In case of token rotation (token is revoked and replaced when it is used), the returned token is a new one, which means that it must be stored in place of the original one. If token rotation is not configured, I assume that the original token can be kept and used infinitely (tbc).

@w-scho w-scho changed the title [EOEPCA/IAM] Evaluate Delegated Authorization Approaches [EOEPCA/IAM] Evaluate Delegated Access Approaches Feb 13, 2025
@w-scho
Copy link
Collaborator Author

w-scho commented Feb 13, 2025

Proposed way to obtain and manage an offline token (may still be inaccurate or incomplete!):

  • Each service that requires offline tokens should provide a dedicated endpoint for token retrieval. The route to this endpoint should be configured to obtain an offline token through the openid-connect plugin. The endpoint itself should take the offline token from the X-Refresh-Token header, store it and then take appropriate action (whatever needs to be done next from the service's perspective) directly or through a redirect.
  • If the service needs to retrieve an offline token for a user, it should notify the user about this (explaining in detail what happens) and ask him for confirmation and consent. Then it should redirect the user's client to the above-mentioned endpoint in order to retrieve the token. The service now "legally" possesses an offline token, and the user knows about this.
  • If the service already possesses an offline token for a user, it can check if the token is still valid by creating an access token through it. If this fails, the offline token is invalid, and a new offline token needs to be retrieved.
  • Services should not perform any regular keep-alive activities on offline tokens. Offline tokens should only be touched when they are actually needed.
  • If the offline token endpoint is accessed outside the control of the service (e.g., by an attacker who tries to flood Keycloak with offline tokens), the service should immediately revoke the generated offline token. In case of repeated attempts, the service should generate a security alert.
  • If the service knows that it does not need the offline token any more, it should explicitly revoke it and delete it from its database. This may be the case, e.g., if the service is notified that the user account has been disabled or if the user explicitly requests deletion of their data from the service.
  • A service that holds an offline token never passes it to other services, because they could not use it anyway. Instead, it requests an access token (JWT) and passes it to upstream services, which in turn may pass it to their upstream sevices. This implies that there is a master service that holds the offline token and may use it to bridge time gaps, whereas all other (slave) services only get an access token and are therefore not able to bridge time gaps. This may be a serious limitation in some scenarios (tbc).

@w-scho
Copy link
Collaborator Author

w-scho commented Feb 13, 2025

Conclusion:
The sketched approach is suitable if only a single (master) service needs to bridge time gaps. Upstream (slave) services only get an access token, which can only be used for a limited timespan (currently 5min). If a slave service only validates the token initially and then does not use it any more, this is not a limitation. However, for a slave service that needs to call other subservices, this limits the maximum time from the start of an operation till the last external call it performs to the lifetime of an access token.
Regarding the scenarios described in https://docs.google.com/document/d/1oQ5qoIgAD8I5D7X_T8UCu1S72jv-CVHaQ8HLkUvydYA/edit?tab=t.0, for the Processing BB and the Resource Health BB, the limitations should not be a problem.
However, the limitations may affect harvesting activities within a registration workflow, where time gaps may need to be bridged on both sides. So we may need some further discussions here. A possible solution (or rather workaround) could be to let both the Registration API and ther Harvester obtain an offline token upfront. This would probably solve the problem, but it would also break the authorization chain in a way, because instead of using the passed access token, the upstream service would generate access tokens from its own offline token.

@w-scho
Copy link
Collaborator Author

w-scho commented Feb 14, 2025

This is a simplified sequence diagram for offline token retrieval. It is assumed that the user has already interacted with the master service before and is already authenticated. At some point, the master service recognizes that an offline token will be required, notifies the user about this and asks him for confirmation. This confirmation, represented by the prepareOfflineAccess call, triggers the offline token retrieval.
The user is redirected to the offline token retrieval endpoint (represented by getOfflineToken). This triggers an authentication flow with scope offline_access requested, which may involve interaction with the user's client (not shown), but usually not with the user himself. The authentication flow results in an offline token, which is passed to the master service.
The master service stores the token for later use and then redirects to another page (e.g. one that tells the user that the offline token was successfully obtained).
Note that the master service may use other mechanisms than redirection to access the offline token retrieval endpoint. E.g., a form could be submitted to the endpoint, and the endpoint itself could present a confirmation page.
Note that the diagram is simplified (i.e., inaccurate) in the sense that it omits the user's client, treats the IAM as a single service and neglects that actually all communication between the user and the master service passes through APISIX.

Image

The following diagram sketches the use of the offline token. The user is not involved here any more. Instead, the process is initiated by the master service. It loads the offline token it previously stored and exchanges it for an access token via Keycloak's token endpoint. It then calls some action on the slave service with the access token attached. The slave service validates the token via Keycloak, does something useful and sends a response to the master service.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant