Simplified pip env setup #27

sadamov · 2024-05-04T19:31:11Z

This PR simplifies the project setup and updates the requirements file to ensure compatibility and ease of installation. The following changes have been made:

Updated requirements.txt:

Specified minimum versions for all dependencies to ensure compatibility.
Fixed the Cartopy version to resolve the libgeos issue when installing via pip.
Included torch-geometric directly in the requirements file for straightforward installation.
Tested the setup on both CPU and GPU environments to ensure proper functionality.

Updated README.md:

Revised the installation instructions to reflect the simplified setup process.
Provided clear steps for setting up the project environment using the updated requirements.txt.

This should simplify the installation of the environment. The installation is now fully flexible without pinned deps. For papers and such, pinned envs can still be exported of course.

sadamov · 2024-05-13T15:06:11Z

Better solution proposed here by @leifdenby : https://docs.google.com/document/d/1csgMjyuE1Up2m262e-N7qMPmNJ-zHLguGyrgvnjTuBE/edit
handle package dependencies with pdm rather than pip
what: put all dependencies in pyproject.toml with pdm and sync pyproject.toml to requirements.txt for pip-based installs
why: handling dependencies by manually keeping requirements.txt updated is tedious and errorprone task, requirements.txt don't fully describe a package (version, description) and are therefore no longer the recommended way of defining python packages.

sadamov · 2024-05-14T05:36:43Z

@leifdenby I added you as a reviewer here, so that you can take what is useful from this PR but then please implement the more robust variant without requirements.txt file mentioned above. You can either use this branch or supercede this PR after your work is complete. I will give you access to the MeteoSwiss fork, since that is where this PR-branch lives.

joeloskarsson · 2024-05-14T11:49:10Z

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

I agree that requirements handling can be somewhat tedious, but I don't think the project is at a stage that this is a problem yet.

leifdenby · 2024-05-17T12:58:27Z

requirements.txt

 pyproj>=3.4.1
 tueplots>=0.0.8
 plotly>=5.15.0
+torch>=2.0.1
+--find-links https://data.pyg.org/whl/torch-${TORCH}.html


@sadamov I don't quite understand how this is supposed to work if CUDA isn't installed. Doesn't this method require that the TORCH environment variable is set? Or is it just that if TORCH isn't set that pip fill try to fetch a page that doesn't exist and then will default to the cpu version available on pypi?

I think you are right Leif, this doesn't work on CPU only on GPU. I must have gotten confused, because I always install pytorch with conda beforehand (then it works, but --find-links is then useless). Short answer, my solution sucks and we should find a working solution for the soon-to-be packaged repo. Did you already find a way to install torch with pip that is both flexible and robust enough across various machines?

Ah I see :) ok, thanks for the clarification. I will think about this a bit more and give a proposition

leifdenby · 2024-05-17T13:24:47Z

I agree that moving dependencies from requirements.txt into pyproject.toml is good. However, I do not understand why one could not just install these using pip? I am not familiar with pdm, but is my understanding correct that this would introduce an additional tool that people have to use for installing requirements? It seems unnecessary to me to introduce additional tools, especially ones that ML practitioners are unlikely to be familiar with.

Yes, I agree adding another tool isn't something we should just do for the sake of it, and I think we can actually make it optional. By this I mean that the package dependencies will be defined in pyproject.toml (rather than requirements.txt) and this could in theory still be updated by hand, and we can then sync this automatically to a requirements.txt-file if needed (this is done by a pre-commit hook). Newer versions of pip can install directly from pyproject.toml, and so using pdm to manage and install dependencies would be optional.

For someone not using pdm the process of setting up a new development environment would be:

Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
Optionally create your virtualenv with your favourite tool, maybe mkvirtualenv ... and then activate ...
Install neural-lam editable with pip so that changes you make are reflected in the site-packages version (actually this is simply a link) pip install -e .
Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml
Manually install a dependency with pip and edit pyproject.toml by hand, i.e. python -m pip install ... and edit the pyproject.toml file
Stage, commit and push!

I like using a package manager like pdm (other options are pipenv, poetry, uv) because it makes the development process easier by 1) automatically updating package versions in pyproject.toml to ensure versions work together when I add a new package, 2) allowing me to "dry run" any dependency modification without clobbering my current env (this has hurt me quite a few times with pip) and 3) handling creating and activating virtuals envs

So with pdm the development process would be:

Clone repo git clone http://github.com/<username>/neural-lam and change directory cd neural-lam
Create virtual env pdm venv create and either activating it pdm venv activate ... or making it the default when using pdm run python ... with pdm use ...
Install package pdm install
Add dependency pdm add ... (this is where you can do pdm add --dry-run ... to see what would change before you install a package), or remove one with pdm remove .... You can also add dev dependencies separately (which won't be included in wheels) with pdm add --dev ... or dependency groups with pdm add --group visualisation matplotlib for example (if we didn't want visualisation tools installed by default)
Edit and run neural-lam: python -m neural_lam.train_model --config-file config.yaml (if using activated virtualenv) or pdm run python -m neural_lam.train_model --config-file config.yaml
Stage, commit and push!

joeloskarsson · 2024-05-17T14:41:39Z

Thanks for clarifying and outlining that Leif. I think that all sounds good, and the actual change is just to move dependencies to pyproject.toml. Then users can use whatever tool they want to handle their environment. We could keep sync a requirements.txt as well, but I would not mind just requiring people to have a new enough version of pip to install directly from pyproject.toml.

Will see what I end up doing myself going forward. The pdm workflow seems convenient as well.

joeloskarsson · 2024-05-24T14:51:45Z

Can we close this if it is fully superseded by #37 ?

sadamov · 2024-05-25T13:43:54Z

Superceded by #37

Simplified pip env setup

12d2085

sadamov requested a review from joeloskarsson May 4, 2024 19:31

sadamov requested a review from leifdenby May 14, 2024 05:31

leifdenby reviewed May 17, 2024

View reviewed changes

leifdenby mentioned this pull request May 22, 2024

Move dependencies to pyproject toml and create ci/cd install and import test #37

Merged

sadamov closed this May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplified pip env setup #27

Simplified pip env setup #27

sadamov commented May 4, 2024

sadamov commented May 13, 2024 •

edited

Loading

sadamov commented May 14, 2024

joeloskarsson commented May 14, 2024

leifdenby May 17, 2024

sadamov May 17, 2024 •

edited

Loading

leifdenby May 21, 2024

leifdenby commented May 17, 2024

joeloskarsson commented May 17, 2024

joeloskarsson commented May 24, 2024

sadamov commented May 25, 2024

Simplified pip env setup #27

Simplified pip env setup #27

Conversation

sadamov commented May 4, 2024

sadamov commented May 13, 2024 • edited Loading

sadamov commented May 14, 2024

joeloskarsson commented May 14, 2024

leifdenby May 17, 2024

Choose a reason for hiding this comment

sadamov May 17, 2024 • edited Loading

Choose a reason for hiding this comment

leifdenby May 21, 2024

Choose a reason for hiding this comment

leifdenby commented May 17, 2024

joeloskarsson commented May 17, 2024

joeloskarsson commented May 24, 2024

sadamov commented May 25, 2024

sadamov commented May 13, 2024 •

edited

Loading

sadamov May 17, 2024 •

edited

Loading