Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplified pip env setup #27

Closed
wants to merge 1 commit into from

Conversation

sadamov
Copy link
Collaborator

@sadamov sadamov commented May 4, 2024

This PR simplifies the project setup and updates the requirements file to ensure compatibility and ease of installation. The following changes have been made:

Updated requirements.txt:

  • Specified minimum versions for all dependencies to ensure compatibility.
  • Fixed the Cartopy version to resolve the libgeos issue when installing via pip.
  • Included torch-geometric directly in the requirements file for straightforward installation.
  • Tested the setup on both CPU and GPU environments to ensure proper functionality.

Updated README.md:

  • Revised the installation instructions to reflect the simplified setup process.
  • Provided clear steps for setting up the project environment using the updated requirements.txt.

This should simplify the installation of the environment. The installation is now fully flexible without pinned deps. For papers and such, pinned envs can still be exported of course.

@sadamov sadamov requested a review from joeloskarsson May 4, 2024 19:31
@sadamov
Copy link
Collaborator Author

sadamov commented May 13, 2024

Better solution proposed here by @leifdenby : https://docs.google.com/document/d/1csgMjyuE1Up2m262e-N7qMPmNJ-zHLguGyrgvnjTuBE/edit
handle package dependencies with pdm rather than pip
what: put all dependencies in pyproject.toml with pdm and sync pyproject.toml to requirements.txt for pip-based installs
why: handling dependencies by manually keeping requirements.txt updated is tedious and errorprone task, requirements.txt don't fully describe a package (version, description) and are therefore no longer the recommended way of defining python packages.

@sadamov sadamov requested a review from leifdenby May 14, 2024 05:31
@sadamov
Copy link
Collaborator Author

sadamov commented May 14, 2024

@leifdenby I added you as a reviewer here, so that you can take what is useful from this PR but then please implement the more robust variant without requirements.txt file mentioned above. You can either use this branch or supercede this PR after your work is complete. I will give you access to the MeteoSwiss fork, since that is where this PR-branch lives.

@joeloskarsson
Copy link
Collaborator

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

I agree that requirements handling can be somewhat tedious, but I don't think the project is at a stage that this is a problem yet.

pyproj>=3.4.1
tueplots>=0.0.8
plotly>=5.15.0
torch>=2.0.1
--find-links https://data.pyg.org/whl/torch-${TORCH}.html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadamov I don't quite understand how this is supposed to work if CUDA isn't installed. Doesn't this method require that the TORCH environment variable is set? Or is it just that if TORCH isn't set that pip fill try to fetch a page that doesn't exist and then will default to the cpu version available on pypi?

Copy link
Collaborator Author

@sadamov sadamov May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are right Leif, this doesn't work on CPU only on GPU. I must have gotten confused, because I always install pytorch with conda beforehand (then it works, but --find-links is then useless). Short answer, my solution sucks and we should find a working solution for the soon-to-be packaged repo. Did you already find a way to install torch with pip that is both flexible and robust enough across various machines?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see :) ok, thanks for the clarification. I will think about this a bit more and give a proposition

@leifdenby
Copy link
Member

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

Yes, I agree adding another tool isn't something we should just do for the sake of it, and I think we can actually make it optional. By this I mean that the package dependencies will be defined in pyproject.toml (rather than requirements.txt) and this could in theory still be updated by hand, and we can then sync this automatically to a requirements.txt-file if needed (this is done by a pre-commit hook). Newer versions of pip can install directly from pyproject.toml, and so using pdm to manage and install dependencies would be optional.

For someone not using pdm the process of setting up a new development environment would be:

  1. Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
  2. Optionally create your virtualenv with your favourite tool, maybe mkvirtualenv ... and then activate ...
  3. Install neural-lam editable with pip so that changes you make are reflected in the site-packages version (actually this is simply a link) pip install -e .
  4. Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml
  5. Manually install a dependency with pip and edit pyproject.toml by hand, i.e. python -m pip install ... and edit the pyproject.toml file
  6. Stage, commit and push!

I like using a package manager like pdm (other options are pipenv, poetry, uv) because it makes the development process easier by 1) automatically updating package versions in pyproject.toml to ensure versions work together when I add a new package, 2) allowing me to "dry run" any dependency modification without clobbering my current env (this has hurt me quite a few times with pip) and 3) handling creating and activating virtuals envs

So with pdm the development process would be:

  1. Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
  2. Create virtual env pdm venv create and either activating it pdm venv activate ... or making it the default when using pdm run python ... with pdm use ...
  3. Install package pdm install
  4. Add dependency pdm add ... (this is where you can do pdm add --dry-run ... to see what would change before you install a package), or remove one with pdm remove .... You can also add dev dependencies separately (which won't be included in wheels) with pdm add --dev ... or dependency groups with pdm add --group visualisation matplotlib for example (if we didn't want visualisation tools installed by default)
  5. Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml (if using activated virtualenv) or pdm run python -m neural_lam.train_model --config-file config.yaml
  6. Stage, commit and push!

@joeloskarsson
Copy link
Collaborator

Thanks for clarifying and outlining that Leif. I think that all sounds good, and the actual change is just to move dependencies to pyproject.toml. Then users can use whatever tool they want to handle their environment. We could keep sync a requirements.txt as well, but I would not mind just requiring people to have a new enough version of pip to install directly from pyproject.toml.

Will see what I end up doing myself going forward. The pdm workflow seems convenient as well.

@joeloskarsson
Copy link
Collaborator

Can we close this if it is fully superseded by #37 ?

@sadamov
Copy link
Collaborator Author

sadamov commented May 25, 2024

Superceded by #37

@sadamov sadamov closed this May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants