Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RBAC Access denied from Pipeline Run pod #2794

Closed
4 of 7 tasks
kimwnasptd opened this issue Jul 10, 2024 · 14 comments · Fixed by #2795
Closed
4 of 7 tasks

RBAC Access denied from Pipeline Run pod #2794

kimwnasptd opened this issue Jul 10, 2024 · 14 comments · Fixed by #2795

Comments

@kimwnasptd
Copy link
Member

Validation Checklist

Version

1.9

Describe your issue

Using the RC2 of Kubeflow and trying to create a Pipeline run. The pipeline fails and I see in the logs of the driver pod of the first step:

time="2024-07-10T10:58:15.813Z" level=info msg="capturing logs" argo=true
I0710 10:58:15.905160      33 main.go:108] input ComponentSpec:{
  "executorLabel": "exec-preprocess",
  "inputDefinitions": {
...
F0710 10:58:16.006554      33 main.go:79] KFP driver: driver.Container(pipelineName=tutorial-data-passing, runID=486a5fd7-39fe-4ea7-82f0-3162fcd1e421, task="preprocess", component="comp-preprocess", dagExecutionID=2, componentSpec) failed: failure while getting executionCache: failed to list tasks: rpc error: code = PermissionDenied desc = RBAC: access denied
time="2024-07-10T10:58:16.814Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-07-10T10:58:16.814Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-07-10T10:58:16.814Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-07-10T10:58:16.814Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true
Error: exit status 1

Looks like someone (Istio most probably) is denying a request from the driver to get the executionCache

Steps to reproduce the issue

  1. Deploy from RC2
  2. Try to create a pipeline run

Put here any screenshots or videos (optional)

No response

@kimwnasptd
Copy link
Member Author

Not sure if related, but am looking at the wait container from the same Pod and see the following logs

time="2024-07-10T10:58:17.645Z" level=warning msg="failed to patch task set, falling back to legacy/insecure pod patch, see https://argo-workflows.readthedocs.io/en/release-3.4/workflow-rbac/" error="workflowtaskresults.argoproj.io is forbidden: User \"system:serviceaccount:kubeflow-user-example-com:default-editor\" cannot create resource \"workflowtaskresults\" in API group \"argoproj.io\" in the namespace \"kubeflow-user-example-com\""

Playing a bit around with kubectl auth can-i I see the following interesting things:

kubectl auth can-i \
    create \
    workflowtaskresults.argoproj.io \
    -n kubeflow-user-example-com \
    --as system:serviceaccount:kubeflow-user-example-com:default-editor
# no

kubectl auth can-i \
    create \
    workflows.argoproj.io \
    -n kubeflow-user-example-com \
    --as system:serviceaccount:kubeflow-user-example-com:default-editor
# yes

@kimwnasptd
Copy link
Member Author

Setting the ml-pipeline AuthorizationPolicy to accept requests from everywhere (add a rues: [{}]) resolves the issue.

So next step is to understand if the problem is with the communication between the KFP components or from requests from pods in user namespaces to KFP pods

@kimwnasptd
Copy link
Member Author

Specifically, adding the to the authorization the following rule makes the runs to succeed

  - when:
    - key: request.headers[kubeflow-userid]
      notValues: ['*']

So there's a good chance that this was broken because of #2747

cc @juliusvonkohout @kromanow94

@kimwnasptd
Copy link
Member Author

My latest understanding is the following:

  1. The pipeline steps are sending requests that are from pods that don't have a sidecar
  2. The current rules in ml-pipeline AuthorizationPolicy allow requests if:
    1. They come from another KFP components (first rule)
    2. The request has a JWT

But the pipeline steps themselves don't set any JWT. So that's the reason we see the error in the steps themselves.

@kimwnasptd
Copy link
Member Author

My proposal to unblock is the following AuthorizationPolicy:

apiVersion: security.istio.io/v1
kind: AuthorizationPolicy
metadata:
  labels:
    app.kubernetes.io/component: ml-pipeline
    app.kubernetes.io/name: kubeflow-pipelines
    application-crd-id: kubeflow-pipelines
  name: ml-pipeline
  namespace: kubeflow
spec:
  rules:
  - from:
    - source:
        principals:
        - cluster.local/ns/kubeflow/sa/ml-pipeline
        - cluster.local/ns/kubeflow/sa/ml-pipeline-ui
        - cluster.local/ns/kubeflow/sa/ml-pipeline-persistenceagent
        - cluster.local/ns/kubeflow/sa/ml-pipeline-scheduledworkflow
        - cluster.local/ns/kubeflow/sa/ml-pipeline-viewer-crd-service-account
        - cluster.local/ns/kubeflow/sa/kubeflow-pipelines-cache
  - from:
    - source:
        requestPrincipals:
        - '*'
  - when:
    - key: request.headers[kubeflow-userid]
      notValues: ['*']
  selector:
    matchLabels:
      app: ml-pipeline

This essentially will allow requests if:

  • There is a JWT
  • If the pod does not set the kubeflow-userid header
    • This will allow requests from Pods without a sidecar
    • This will block the above pods from impersonating someone, unless they have a JWT

Also note that our current RequestAuthentication adds the kubeflow-userid header from the JWT. But this shouldn't be an issue, since the request will be accepted because it has a JWT (so a rule matched).

@kimwnasptd
Copy link
Member Author

@juliusvonkohout @kromanow94 do you also see the above issue as well?

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 10, 2024

I think RC.2 has 43eec94 according to v1.9.0-rc.1...1.9.0-rc.2 and people claim that it is working in #2611 (comment).

Argo 3.4 in KFP 2.2.0 brought in workflowtaskresults.argoproj.io i guess. Maybe we are missing this permission in the roles. CC also @rimolive

I am also fine with your authentication approach @kimwnasptd, but i am just on vacation until July 20. So i will cut the final relase around June 20-21 and do the changelog for it, but i cannot assist with this topic here much until July 20.

@juliusvonkohout
Copy link
Member

juliusvonkohout commented Jul 10, 2024

A tests might be broken partially and could explain why this has not been detected in https://github.com/kubeflow/manifests/actions/runs/9825018339/job/27124646100

We allow the status failed as well

if status not in ["SUCCEEDED", "FAILED", "ERROR"]:

But the test should fail if the pipeline status is failed at

print(f"Run with id {run_id} finished with status: {status}.")

It should be if status != "SUCCEEDED" exit 1 or so

@yhwang
Copy link
Member

yhwang commented Jul 10, 2024

@kimwnasptd add some details about my comment about the 1.9.0-rc2 in IKS. In IKS manifest repo v1.9-branch which is downstream of kubeflow/manifests, I did add this patch to make the pipeline works:

  - when:
    - key: request.headers[kubeflow-userid]
      notValues: ['*']

@kimwnasptd
Copy link
Member Author

Thanks for confirming @yhwang!

Then I propose that we go over with this fix for 1.9 cc @rimolive

@kromanow94
Copy link
Contributor

Hey, yes, I can see that on my end as well. I'll make a PR with the changes described by @kimwnasptd and also fix the gh-workflow test. I think it makes a good sense.

This also makes me wonder if we'd like to improve the security sometime in future and configure the Steps to authenticate with SA Token to the ml-pipeline endpoint.

@kromanow94
Copy link
Contributor

PR is here: #2795

@juliusvonkohout juliusvonkohout linked a pull request Jul 11, 2024 that will close this issue
@juliusvonkohout
Copy link
Member

Thank you for the PR @kromanow94. i adjusted the tests slightly to trigger all KFP tests, checked the outputs and merged it.
I also pushed it to the 1.9 branch.

Feel free to reopen if that is not enough.

@papagala
Copy link

I can confirm I had the exact same issue in the latest Kubeflow 1.9 rc2. Thanks @kromanow94

F0718 09:04:11.900602      20 main.go:79] KFP driver: driver.Container(pipelineName=pipeline, runID=96a02a1a-a08a-4fbf-8d5d-169bcc305326, task="cat-hosts", component="comp-kfp-busybox", dagExecutionID=4, componentSpec, KubernetesExecutorConfig) failed: failure while getting executionCache: failed to list tasks: rpc error: code = PermissionDenied desc = RBAC: access denied
time="2024-07-18T09:04:12.435Z" level=info msg="sub-process exited" argo=true error="<nil>"
time="2024-07-18T09:04:12.435Z" level=error msg="cannot save parameter /tmp/outputs/pod-spec-patch" argo=true error="open /tmp/outputs/pod-spec-patch: no such file or directory"
time="2024-07-18T09:04:12.435Z" level=error msg="cannot save parameter /tmp/outputs/cached-decision" argo=true error="open /tmp/outputs/cached-decision: no such file or directory"
time="2024-07-18T09:04:12.435Z" level=info msg="/tmp/outputs/condition -> /var/run/argo/outputs/parameters//tmp/outputs/condition" argo=true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants