Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to run the example of pytorch-mnist-ddp #2464

Closed
wangpf09 opened this issue May 31, 2023 · 2 comments
Closed

how to run the example of pytorch-mnist-ddp #2464

wangpf09 opened this issue May 31, 2023 · 2 comments

Comments

@wangpf09
Copy link

I have kubeflow deployed now, but there is a problem running the official mnist example, how should I solve it? The yml of PytorchJob is as follows:

apiVersion: kubeflow.org/v1
kind: PyTorchJob
metadata:
  name: pytorch-mnist-ddp-gpu
  namespace: kubeflow-user-example-com
spec:
  pytorchReplicaSpecs:
    Master:
      replicas: 1
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - image: gcr.io/kubeflow-examples/pytorch-mnist-ddp-gpu
              name: pytorch
              resources:
                limits:
                  cpu: '1'
                  memory: 4Gi
                  nvidia.com/gpu: 1
              volumeMounts:
                - mountPath: /mnt/kubeflow-gcfs
                  name: kubeflow-gcfs
          volumes:
            - name: kubeflow-gcfs
              persistentVolumeClaim:
                claimName: kubeflow-gcfs
                readOnly: false
    Worker:
      replicas: 2
      restartPolicy: OnFailure
      template:
        spec:
          containers:
            - image: gcr.io/kubeflow-examples/pytorch-mnist-ddp-gpu
              name: pytorch
              resources:
                limits:
                  cpu: '1'
                  memory: 4Gi
                  nvidia.com/gpu: 1
              volumeMounts:
                - mountPath: /mnt/kubeflow-gcfs
                  name: kubeflow-gcfs
          volumes:
            - name: kubeflow-gcfs
              persistentVolumeClaim:
                claimName: kubeflow-gcfs
                readOnly: false

8d731664134b224973a790c50a2885d

@juliusvonkohout
Copy link
Member

Hello,

this support question can be asked on the slack channel.

/close

There has been no activity for a long time. Please reopen if necessary.

@google-oss-prow
Copy link

@juliusvonkohout: Closing this issue.

In response to this:

Hello,

this support question can be asked on the slack channel.

/close

There has been no activity for a long time. Please reopen if necessary.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants