Create validate_docker_image.yml #1771

juliagmt-google · 2024-04-04T20:35:01Z

Validations for : https://github.com/pytorch/pytorch/actions/runs/8606271166/job/23584360994

Validates docker images living in: https://github.com/orgs/pytorch/packages/container/package/pytorch

Add a file to print a statement.

atalman · 2024-04-10T15:38:07Z

.github/workflows/validate_docker_images.yml

+    strategy:
+      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
+    container:
+      image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}


Please change image to matrix.docker which should be now since this PR is merged: pytorch/test-infra#5081

Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

juliagmt-google

Updated the code and triggered the workflow run.

juliagmt-google · 2024-04-10T17:23:17Z

.github/workflows/validate_docker_images.yml

+    strategy:
+      matrix: ${{ fromJson(needs.generate-matrix.outputs.matrix) }}
+    container:
+      image: ghcr.io/pytorch/pytorch:2.2.2-cuda${{ matrix.cuda }}-cudnn${{ matrix.cudnn_version }}-${{ matrix.image_type }}


Updated and workflow partially succeeded: https://github.com/juliagmt-google/builder/actions/runs/8635045506/job/23672552152

The failed workflow complained about not enough space:
failed to register layer: write /opt/conda/lib/python3.10/test/support/__init__.py: no space left on device Warning: Docker pull failed with exit code 1, back off 3.699 seconds before retry. /usr/bin/docker --config /home/runner/work/_temp/.docker_ff254e09-5ee1-4d53-9f39-52c9c2a6a945 pull ghcr.io/pytorch/pytorch-nightly:2.4.0.dev20240410-cuda11.8-cudnn8-devel

juliagmt-google · 2024-04-10T19:39:10Z

Added run-cpu-tests and run-gpu-tests to validate docker images; tested in https://github.com/juliagmt-google/builder/actions/runs/8636731822

run-cpu-tests: 3/4 passed, 1/4 failed with local error of insufficient space
run-gpu-tests: 4/4 failed due to permission to use linux.g5.4xlarge.nvidia.gpu locally;
error: Called workflows cannot be queued onto self-hosted runners across organizations/enterprises. Failed to queue this job. Labels: 'linux.g5.4xlarge.nvidia.gpu'.

.github/workflows/validate_docker_images.yml

Co-authored-by: Andrey Talman <[email protected]>

.github/workflows/validate_docker_images.yml

Create validate_docker_image.yml

bc0e315

Add a file to print a statement.

facebook-github-bot added the cla signed label Apr 4, 2024

juliagmt-google added 28 commits April 4, 2024 13:39

Update validate_docker_image.yml

5194442

Update validate_docker_image.yml

8c775c0

Add docker pull command

32911b3

Update validate_docker_image.yml

e630f17

Update validate_docker_image.yml

8645bc7

Add more steps in validate_docker_images.yml

674ea86

Update validate_docker_images.yml

ab428ce

Update validate_docker_images.yml

e7065ce

Update validate_docker_images.yml

671db73

Update validate_docker_images.yml

87b8ecd

Update validate_docker_images.yml

e52e864

Update validate_docker_images.yml

a65f44c

Update validate_docker_images.yml

5f78286

Update validate_docker_images.yml

37fea37

Update validate_docker_images.yml

d928903

Update validate_docker_images.yml

6b9e193

Update validate_docker_images.yml

91a2585

Update validate_docker_images.yml

521cdad

Update validate_docker_images.yml

1f7de12

Update validate_docker_images.yml

eea99f1

Update validate_docker_images.yml

31229e9

Update validate_docker_images.yml

6a5f1b5

Update validate_docker_images.yml

712dbc1

Update validate_docker_images.yml

3ecd72b

Update validate_docker_images.yml

4e7bbc7

Update validate_docker_images.yml

2dcff31

Update validate_docker_images.yml

e32a803

Update validate_docker_images.yml

d02f074

juliagmt-google added 11 commits April 8, 2024 16:04

Update validate_docker_images.yml

041d904

Update validate_docker_images.yml

67a56ac

Update validate_docker_images.yml

9ba6088

Update validate_docker_images.yml

0107593

Update validate_docker_images.yml

b605192

Update validate_docker_images.yml

86c4755

Update validate_docker_images.yml

46561c0

Merge branch 'pytorch:main' into patch-1

9a61f5e

Update validate_docker_images.yml

9cc71ad

Update validate_docker_images.yml

99d11d0

Update validate_docker_images.yml

0a99f1e

atalman mentioned this pull request Apr 10, 2024

Add docker image to docker release matrix pytorch/test-infra#5081

Merged

atalman reviewed Apr 10, 2024

View reviewed changes

juliagmt-google added 2 commits April 10, 2024 09:28

Update validate_docker_images.yml

a11467f

Update validate_docker_images.yml

58085f5

juliagmt-google commented Apr 10, 2024

View reviewed changes

juliagmt-google added 4 commits April 10, 2024 10:55

Remove trigger on pushes to finalize the logic

42e036a

Add gpu tests

c46f7be

Testing using push

d9ac7f3

Remove trigger on push

fcb32e0

atalman reviewed Apr 10, 2024

View reviewed changes

.github/workflows/validate_docker_images.yml Outdated Show resolved Hide resolved

juliagmt-google and others added 2 commits April 10, 2024 12:47

Remove run-cpu-tests

404cff8

Update job_name

30e49bd

Co-authored-by: Andrey Talman <[email protected]>

atalman reviewed Apr 10, 2024

View reviewed changes

.github/workflows/validate_docker_images.yml Outdated Show resolved Hide resolved

Update .github/workflows/validate_docker_images.yml

1a49fe0

atalman reviewed Apr 10, 2024

View reviewed changes

.github/workflows/validate_docker_images.yml Outdated Show resolved Hide resolved

Update .github/workflows/validate_docker_images.yml

9089c5d

atalman approved these changes Apr 10, 2024

View reviewed changes

atalman merged commit e7948ec into pytorch:main Apr 10, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create validate_docker_image.yml #1771

Create validate_docker_image.yml #1771

juliagmt-google commented Apr 4, 2024 •

edited by atalman

Loading

atalman Apr 10, 2024

juliagmt-google Apr 10, 2024

juliagmt-google left a comment

juliagmt-google Apr 10, 2024

juliagmt-google commented Apr 10, 2024

Create validate_docker_image.yml #1771

Create validate_docker_image.yml #1771

Conversation

juliagmt-google commented Apr 4, 2024 • edited by atalman Loading

atalman Apr 10, 2024

Choose a reason for hiding this comment

juliagmt-google Apr 10, 2024

Choose a reason for hiding this comment

juliagmt-google left a comment

Choose a reason for hiding this comment

juliagmt-google Apr 10, 2024

Choose a reason for hiding this comment

juliagmt-google commented Apr 10, 2024

juliagmt-google commented Apr 4, 2024 •

edited by atalman

Loading