Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docker] Update DGL version to 2.3 and torch to 2.3 #883

Merged
merged 8 commits into from
Jul 10, 2024

Conversation

jalencato
Copy link
Collaborator

@jalencato jalencato commented Jun 18, 2024

Issue #, if available:

Description of changes:

Fix the dependency version in local docker container

Torch 2.0+ does not support numpy >= 2.0 as we are using numpy.int64 in infer type.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@thvasilo
Copy link
Contributor

What's the error we're getting here? Was numpy 2.x just released?

@jalencato
Copy link
Collaborator Author

What's the error we're getting here? Was numpy 2.x just released?

Currently the our docker container will install some wrong version sub-dependencies. Like pyarrow & numpy. The default numpy version in the container now is 2.0, which will throw a warning:

A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x   versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'. If you are a user of the module, the easiest solution will be to   downgrade to 'numpy<2' or try to upgrade the affected module.   We expect that some modules will need time to support NumPy 2.
It will happen when importing the dgl.

Also check the issue here: #884, starting from 2.2, dgl stores its dependencies in a new place https://data.dgl.ai/wheels/torch-${TORCH_MAJOR_MINOR}/cu${DGL_CUDA_VERSION}/repo.html, previously it is https://data.dgl.ai/wheels/cu${DGL_CUDA_VERSION}/repo.html. It is about another fix.

Currently this PR is holding for waiting the regression performance. I want to make sure all the performance works good before asking for review.

Copy link
Collaborator Author

@jalencato jalencato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update:

The bug only happens during torch version < 2.3, we may not change the Dockerfile now, but only leaves a comment about it.

@jalencato jalencato marked this pull request as ready for review June 25, 2024 19:59
@jalencato jalencato changed the title [WIP] [Docker Bug] Update local dockerfile [Docker] Update DGL version to 2.3 and torch to 2.3 Jun 28, 2024
@thvasilo
Copy link
Contributor

thvasilo commented Jun 29, 2024

We should try to pin the numpy to version 1.26.4 then, to avoid such issues. Generally, we want all the direct dependencies of GSF to be pinned (and perhaps all direct dependencies of DGL), then pin others too if we run into issues.

In the future we can look to create generate requirements files from a pyproject.toml, either using poetry as we do in GSProcessing, and also take a look at https://github.com/astral-sh/uv

@jalencato jalencato merged commit 224bf1b into awslabs:main Jul 10, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants