Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create validate_docker_image.yml #1771

Merged
merged 53 commits into from
Apr 10, 2024
Merged

Conversation

juliagmt-google
Copy link
Contributor

@juliagmt-google juliagmt-google commented Apr 4, 2024

Add a file to print a statement.
strategy:
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
container:
image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change image to matrix.docker which should be now since this PR is merged: pytorch/test-infra#5081

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

Copy link
Contributor Author

@juliagmt-google juliagmt-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the code and triggered the workflow run.

strategy:
matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
container:
image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

@juliagmt-google
Copy link
Contributor Author

Added run-cpu-tests and run-gpu-tests to validate docker images; tested in https://github.com/juliagmt-google/builder/actions/runs/8636731822

  • run-cpu-tests: 3/4 passed, 1/4 failed with local error of insufficient space

  • run-gpu-tests: 4/4 failed due to permission to use linux.g5.4xlarge.nvidia.gpu locally;
    error: Called workflows cannot be queued onto self-hosted runners across organizations/enterprises. Failed to queue this job. Labels: 'linux.g5.4xlarge.nvidia.gpu'.

@atalman atalman merged commit e7948ec into pytorch:main Apr 10, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants