Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenFF and AmberTools #384

Closed
lohedges opened this issue Sep 22, 2022 · 14 comments
Closed

OpenFF and AmberTools #384

lohedges opened this issue Sep 22, 2022 · 14 comments

Comments

@lohedges
Copy link
Member

Recent changes mean that we can no longer provide support for certain OpenFF functionality without also depending on the openff-interchange package, which pulls in the conda-forge ambertools package as a dependency. Previously, we only depended on openff-toolkit-base, which does not require ambertools. This allows the user to provide ambertools via an external installation, which will likely be the case. We added our own check of the version number, since OpenFF requires a recent enough antechamber to work properly.

I can understand the reasons for this change, i.e. it makes sense for OpenFF to depend on a known and tested version of ambertools. Going forward, we will need to be careful to ensure that we can correctly handle dual installation of ambertools, i.e. letting AMBERHOME take precedence, but correctly finding the binaries in the conda environment as a fallback. I'll need to check if/how the conda ambertools sets AMBERHOME, since it might be necessary for the user to directly set this in BIoSimSpace scripts, or when running them. Thankfully recent conda ambertools packages seem to be compatible with Sire/BioSimSpace, which wasn't the case a few years ago, e.g. limited Python variant support, breaking library compatibility, etc.

(Note that it's probably easiest if we just change our dependency to openff-toolkit, rather than openff-toolkit-base and openff-interchange.)

@kexul
Copy link
Contributor

kexul commented Sep 22, 2022

Dear BioSimSpace developers, is it possible to reduce the dependencies that BioSimSpace needs? I recently built docker images and found that a fresh installation of BioSimSpace takes about 6 G disk space, which is very large for me.

Many thanks!

@lohedges
Copy link
Member Author

It's very challenging to do this because of how the conda build system works and how dependencies are handled. A lot of the bloat comes from the switch to using the conda-forge version of OpenMM, which pulls in cudatoolkit. (Previously, the Omnia package assumed the user had their own toolkit installed and provided channel labels to match against that version.) Other dependencies (I'm not sure which yet) now also pull in cudnn, jax, etc., due to the addition of ML functionality, which are also very large packages. (This is a recent change.)

Ideally package maintainers would create minimal base packages, with additional sub-packages for extra functionality. However, this quickly becomes a maintenance nightmare and I don't begrudge small development teams choosing not to do this, since the bandwidth simply isn't there. I also understand the need for developers to build and test against a known set of dependencies, i.e. bundling dependencies that could be provided externally, e.g. ambertools, cudatoolkit.

With BioSimSpace, we have provided a ModuleStub class in our internal _Utils sub-package. This allows the user to import modules using try_import wrapper. Unlike import, this will not immediately raise an ImportError when a module is missing, but instead raises an exception at the point at which functionality requiring that module is used. This means that a user could create a streamlined BioSimSpace install for specific purposes by pruning unnecessary packages from their environment. (Working out which ones are unnecessary is another issue, though.)

Additionally, the whole of BioSimSpace depends on Sire, which is used everywhere, and it is Sire that pulls in the majority of the dependencies. (Other than RDKit, most of BioSimSpace's dependencies are pure Python.)

@jmichel80
Copy link
Contributor

Just to add that as you can see from Lester’s reply we are aware of this issue and are thinking about how to mitigate problems but i expect it will take some time to come up with a robust way to streamline distribution.

@lohedges
Copy link
Member Author

lohedges commented Sep 22, 2022

From a quick search, it looks like it's the new version of pymbar that is pulling in cudnn and jaxlib. I'm not actually sure that the SOMD analysis is compatible with this version (see the issue that I raised here). I've currently pinned to the old version, but this is only true for the 2023.0.0 release of Sire. Once I've merged in the BioSimSpace PR that builds against this (here) our package should be trimmed by at least a GB.

grep cudnn ~/.conda/pkgs/*/info/index.json
/home/lester/.conda/pkgs/cudnn-8.4.1.50-hed8a83a_0/info/index.json:  "name": "cudnn",
/home/lester/.conda/pkgs/jaxlib-0.3.15-cuda112py37ha54ad15_203/info/index.json:    "cudnn >=8.4.1.50,<9.0a0",
/home/lester/.conda/pkgs/jaxlib-0.3.15-cuda112py39h06785b1_203/info/index.json:    "cudnn >=8.4.1.50,<9.0a0",

grep jaxlib ~/.conda/pkgs/*/info/index.json
/home/lester/.conda/pkgs/jax-0.3.17-pyhd8ed1ab_0/info/index.json:    "jaxlib >=0.3.14",
/home/lester/.conda/pkgs/jaxlib-0.3.15-cuda112py37ha54ad15_203/info/index.json:  "name": "jaxlib",
/home/lester/.conda/pkgs/jaxlib-0.3.15-cuda112py39h06785b1_203/info/index.json:  "name": "jaxlib",
/home/lester/.conda/pkgs/pymbar-4.0.1-py37hda87dfa_0/info/index.json:    "jaxlib",
/home/lester/.conda/pkgs/pymbar-4.0.1-py39hd257fcd_0/info/index.json:    "jaxlib",

Yes, as @jmichel80 says, this is something that we are aware of and would like to develop a good long-term solution. Ultimately I think this will require collaboration between various projects, perhaps using a conda sub-channel, building against a mininal and fixed (or slowly evolving) set of dependencies.

@lohedges
Copy link
Member Author

For BioSimSpace, we may even be able to remove the pymbar dependency, since the analysis will eventually be done by alchemlyb, which itself depends on pymbar (currently versions less than 4.0). However, if they update the API to work with pymbar 4.0, then we'll once again pull in cudnn.

A lot of work has gone into minimising the size of the Sire binaries (currently less than 40MB), yet you can depend on one package that pulls in a GB of stuff 🤷‍♂️

@kexul
Copy link
Contributor

kexul commented Sep 22, 2022

Many thanks for your efforts! It's good to know that we will have a lighter version in the future. 😄

@lohedges
Copy link
Member Author

Just to note that I have now discovered that we can avoid the ambertools dependency by depending on openff-toolkit-base and openff-interchange-base. (I hadn't realised that there was a base package for the interchange package.) This is actually useful, since ambertools doesn't (yet) exist on Windows, so depending on the full openff-toolkit (which pulls in ambetools) breaks the Windows build of Sire. (We could use selectors, but this is a bit tricky given the way the build system currently works.)

@lohedges
Copy link
Member Author

lohedges commented Oct 4, 2022

Closing this for now since we have figured out how to remove the ambertools dependency again.

@lohedges lohedges closed this as completed Oct 4, 2022
@lohedges lohedges reopened this Nov 1, 2022
@lohedges
Copy link
Member Author

lohedges commented Nov 1, 2022

Okay, it turns out that we can't drop the ambertools dependency as there is no way to create a self-consistent environment for users that do require the conda-forge ambertools in their environment. For example, doing the following:

conda create -n test -c conda-forge -c michellab/label/dev biosimspace ambertools python=3.8

will give:

ambertools                22.0             py38hf8a91bc_3    conda-forge
sire                      2023.0.0        py38h3fd9d12_16    michellab/label/dev

This results in the latest version of ambertools, but an outdated version of sire.

If you specify the latest version of sire, i.e.:

conda create -n test -c conda-forge -c michellab/label/dev biosimspace ambertools python=3.8 sire=2023=py38h3fd9d12_20

you'll end up with:

ambertools                18.0                          0    omnia
sire                      2023.0.0        py38h3fd9d12_20    michellab/label/dev

This results in the latest version of sire, but an oudated version on ambertools. (I'm actually amazed how outdated this is! If conda worked properly, I would have assumed that it would only allow versions that were compatible with the functionality in openff, which ambertools 18 isn't.)

As a solution we could either depend on openff-interchange, which pulls in the ambertools package, although we'd need to figure out how to fix the Windows build, where ambertools doesn't exist. We could instead add ambertools as a host requirement for sire, i.e. it's not actually required at runtime, but needs to exist in the environment. Users should then be able to install ambertools alongside BioSimSpace. (We might need to be careful to ensure that the latest version of ambertools is used, but this should be resolved by the openff-interchange-base dependency.) We could also just suck it up and depend on the entire openff suite so we are (hopefully) sure to avoid any inconsistencies going forward.

With re-adding, we'll also need to check for issues with AMBERHOME, i.e. which package takes precedence if the user has an external (non-conda) AMBER/AmberTools installation.

@lohedges
Copy link
Member Author

lohedges commented Nov 1, 2022

Having looked at the two sire conda packages listed above, i.e. py38h3fd9d12_16 and py38h3fd9d12_20, it turns out that both correspond to commits after the switch to using openff-interchange-base, i.e. dropping the ambertools dependency. It is very confusing to me why the ambertools package resolution is so wildly different for the two of them. I can only assume that one of the other dependencies has changed, which is causing the inconsistency.

@lohedges
Copy link
Member Author

lohedges commented Nov 1, 2022

Here's the diff between the two recipes:

3c3
< # /home/runner/work/Sire/Sire/Sire/recipes/sire, last modified Mon Oct 17 13:41:40 2022
---
> # /home/runner/work/Sire/Sire/Sire/recipes/sire, last modified Fri Oct 28 13:06:32 2022
13,14c13,14
<   number: '16'
<   string: py38h3fd9d12_16
---
>   number: '20'
>   string: py38h3fd9d12_20
20,21c20,21
<     - binutils_impl_linux-64 2.36.1 h193b22a_2
<     - binutils_linux-64 2.36 hf3e587d_10
---
>     - binutils_impl_linux-64 2.39 h6ceecb4_0
>     - binutils_linux-64 2.39 h5fc0e48_11
26,29c26,29
<     - curl 7.85.0 h2283fc2_0
<     - expat 2.4.9 h27087fc_0
<     - gcc_impl_linux-64 12.1.0 hea43390_17
<     - gcc_linux-64 12.1.0 h3bb4806_10
---
>     - curl 7.86.0 h2283fc2_0
>     - expat 2.5.0 h27087fc_0
>     - gcc_impl_linux-64 12.2.0 hcc96c02_19
>     - gcc_linux-64 12.2.0 h4798a0e_11
31,33c31,33
<     - git 2.38.0 pl5321had83b7b_0
<     - gxx_impl_linux-64 12.1.0 hea43390_17
<     - gxx_linux-64 12.1.0 h1f501c1_10
---
>     - git 2.38.1 pl5321had83b7b_0
>     - gxx_impl_linux-64 12.2.0 hcc96c02_19
>     - gxx_linux-64 12.2.0 hb41e900_11
37,38c37,38
<     - ld_impl_linux-64 2.36.1 hea4e1c9_2
<     - libcurl 7.85.0 h2283fc2_0
---
>     - ld_impl_linux-64 2.39 hc81fddc_0
>     - libcurl 7.86.0 h2283fc2_0
41,43c41,43
<     - libgcc-devel_linux-64 12.1.0 h1ec3361_17
<     - libgcc-ng 12.2.0 h65d4601_18
<     - libgomp 12.2.0 h65d4601_18
---
>     - libgcc-devel_linux-64 12.2.0 h3b97bd3_19
>     - libgcc-ng 12.2.0 h65d4601_19
>     - libgomp 12.2.0 h65d4601_19
47c47
<     - libsanitizer 12.1.0 ha89aaad_17
---
>     - libsanitizer 12.2.0 h46fd767_19
49,50c49,50
<     - libstdcxx-devel_linux-64 12.1.0 h1ec3361_17
<     - libstdcxx-ng 12.2.0 h46fd767_18
---
>     - libstdcxx-devel_linux-64 12.2.0 h3b97bd3_19
>     - libstdcxx-ng 12.2.0 h46fd767_19
68c68
<     - anyio 3.6.1 pyhd8ed1ab_1
---
>     - anyio 3.6.2 pyhd8ed1ab_0
70c70
<     - argon2-cffi-bindings 21.2.0 py38h0a891b7_2
---
>     - argon2-cffi-bindings 21.2.0 py38h0a891b7_3
80c80
<     - biopython 1.79 py38h0a891b7_2
---
>     - biopython 1.79 py38h0a891b7_3
83,87c83,87
<     - boost 1.74.0 py38h2b96118_5
<     - boost-cpp 1.74.0 h75c5d50_8
<     - brotli 1.0.9 h166bdaf_7
<     - brotli-bin 1.0.9 h166bdaf_7
<     - brotlipy 0.7.0 py38h0a891b7_1004
---
>     - boost 1.78.0 py38h4e30db6_3
>     - boost-cpp 1.78.0 h75c5d50_1
>     - brotli 1.0.9 h166bdaf_8
>     - brotli-bin 1.0.9 h166bdaf_8
>     - brotlipy 0.7.0 py38h0a891b7_1005
96,97c96,97
<     - cffi 1.15.1 py38h4a40e3a_0
<     - cftime 1.6.2 py38h26c90d9_0
---
>     - cffi 1.15.1 py38h4a40e3a_2
>     - cftime 1.6.2 py38h26c90d9_1
99c99
<     - colorama 0.4.5 pyhd8ed1ab_0
---
>     - colorama 0.4.6 pyhd8ed1ab_0
101,102c101,102
<     - contourpy 1.0.5 py38h43d8883_0
<     - cryptography 38.0.2 py38h2b5fc30_0
---
>     - contourpy 1.0.5 py38h43d8883_1
>     - cryptography 38.0.2 py38h2b5fc30_2
104c104
<     - curl 7.85.0 h7bff187_0
---
>     - curl 7.86.0 h7bff187_0
107c107
<     - debugpy 1.6.3 py38hfa26641_0
---
>     - debugpy 1.6.3 py38hfa26641_1
112c112
<     - expat 2.4.9 h27087fc_0
---
>     - expat 2.5.0 h27087fc_0
120c120
<     - fontconfig 2.14.0 hc2a2eb6_1
---
>     - fontconfig 2.14.1 hc2a2eb6_0
123c123
<     - fonttools 4.37.4 py38h0a891b7_0
---
>     - fonttools 4.38.0 py38h0a891b7_1
129,130c129,130
<     - glib 2.74.0 h6239696_0
<     - glib-tools 2.74.0 h6239696_0
---
>     - glib 2.74.1 h6239696_0
>     - glib-tools 2.74.1 h6239696_0
133c133
<     - greenlet 1.1.3 py38hfa26641_0
---
>     - greenlet 1.1.3.post0 py38hfa26641_0
141c141
<     - h5py 3.7.0 nompi_py38h045baee_101
---
>     - h5py 3.7.0 nompi_py38h7927eab_102
147c147
<     - importlib-metadata 4.11.4 py38h578d9bd_0
---
>     - importlib-metadata 5.0.0 pyha770c72_1
149c149
<     - ipykernel 6.16.0 pyh210e3f2_0
---
>     - ipykernel 6.16.2 pyh210e3f2_0
160c160
<     - jupyter_client 7.4.2 pyhd8ed1ab_0
---
>     - jupyter_client 7.4.4 pyhd8ed1ab_0
162c162
<     - jupyter_core 4.11.1 py38h578d9bd_0
---
>     - jupyter_core 4.11.1 py38h578d9bd_1
168c168
<     - kiwisolver 1.4.4 py38h43d8883_0
---
>     - kiwisolver 1.4.4 py38h43d8883_1
175,177c175,177
<     - libbrotlicommon 1.0.9 h166bdaf_7
<     - libbrotlidec 1.0.9 h166bdaf_7
<     - libbrotlienc 1.0.9 h166bdaf_7
---
>     - libbrotlicommon 1.0.9 h166bdaf_8
>     - libbrotlidec 1.0.9 h166bdaf_8
>     - libbrotlienc 1.0.9 h166bdaf_8
183c183
<     - libcurl 7.85.0 h7bff187_0
---
>     - libcurl 7.86.0 h7bff187_0
191c191
<     - libgcc-ng 12.2.0 h65d4601_18
---
>     - libgcc-ng 12.2.0 h65d4601_19
193,196c193,196
<     - libgfortran-ng 12.2.0 h69a702a_18
<     - libgfortran5 12.2.0 h337968e_18
<     - libglib 2.74.0 h7a41b64_0
<     - libgomp 12.2.0 h65d4601_18
---
>     - libgfortran-ng 12.2.0 h69a702a_19
>     - libgfortran5 12.2.0 h337968e_19
>     - libglib 2.74.1 h7a41b64_0
>     - libgomp 12.2.0 h65d4601_19
207c207
<     - libpq 14.5 hd77ab85_0
---
>     - libpq 14.5 hd77ab85_1
213c213
<     - libstdcxx-ng 12.2.0 h46fd767_18
---
>     - libstdcxx-ng 12.2.0 h46fd767_19
216c216
<     - libudev1 249 h166bdaf_4
---
>     - libudev1 251 h166bdaf_0
229c229
<     - markupsafe 2.1.1 py38h0a891b7_1
---
>     - markupsafe 2.1.1 py38h0a891b7_2
237c237
<     - msgpack-python 1.0.4 py38h43d8883_0
---
>     - msgpack-python 1.0.4 py38h43d8883_1
243,245c243,245
<     - nbconvert 7.2.1 pyhd8ed1ab_0
<     - nbconvert-core 7.2.1 pyhd8ed1ab_0
<     - nbconvert-pandoc 7.2.1 pyhd8ed1ab_0
---
>     - nbconvert 7.2.3 pyhd8ed1ab_0
>     - nbconvert-core 7.2.3 pyhd8ed1ab_0
>     - nbconvert-pandoc 7.2.3 pyhd8ed1ab_0
249c249
<     - netcdf4 1.6.1 nompi_py38h2250339_100
---
>     - netcdf4 1.6.1 nompi_py38h2250339_101
254c254
<     - notebook-shim 0.1.0 pyhd8ed1ab_0
---
>     - notebook-shim 0.2.0 pyhd8ed1ab_0
257,258c257,258
<     - numexpr 2.8.3 py38h97b1c41_100
<     - numpy 1.23.3 py38h3a7f9d9_0
---
>     - numexpr 2.8.3 py38h36ff5c2_101
>     - numpy 1.23.4 py38h7042d01_1
264,266c264,266
<     - openff-toolkit-base 0.11.1 pyhd8ed1ab_0
<     - openff-units 0.1.8 pyh1a96a4e_0
<     - openff-utilities 0.1.6 pyh1a96a4e_0
---
>     - openff-toolkit-base 0.11.2 pyhd8ed1ab_2
>     - openff-units 0.1.8 pyh1a96a4e_1
>     - openff-utilities 0.1.7 pyh1a96a4e_0
269c269
<     - openssl 1.1.1q h166bdaf_0
---
>     - openssl 1.1.1q h166bdaf_1
271,272c271,272
<     - pandas 1.5.0 py38h8f669ce_0
<     - pandoc 2.19.2 ha770c72_0
---
>     - pandas 1.5.1 py38h8f669ce_1
>     - pandoc 2.19.2 h32600fe_1
281c281
<     - pillow 9.2.0 py38ha3b2c9c_2
---
>     - pillow 9.2.0 py38h9eb91d8_3
291c291
<     - psutil 5.9.2 py38h0a891b7_0
---
>     - psutil 5.9.3 py38h0a891b7_1
297c297
<     - pycairo 1.21.0 py38h9c00e7a_1
---
>     - pycairo 1.21.0 py38h190342e_2
299,300c299,300
<     - pydantic 1.10.2 py38h0a891b7_0
<     - pydot 1.4.2 py38h578d9bd_2
---
>     - pydantic 1.10.2 py38h0a891b7_1
>     - pydot 1.4.2 py38h578d9bd_3
309c309
<     - pyrsistent 0.18.1 py38h0a891b7_1
---
>     - pyrsistent 0.18.1 py38h0a891b7_2
311c311
<     - pytables 3.7.0 py38hf632491_2
---
>     - pytables 3.7.0 py38hf134f34_3
317,319c317,319
<     - pytz 2022.4 pyhd8ed1ab_0
<     - pyyaml 6.0 py38h0a891b7_4
<     - pyzmq 24.0.1 py38hfc09fa9_0
---
>     - pytz 2022.5 pyhd8ed1ab_0
>     - pyyaml 6.0 py38h0a891b7_5
>     - pyzmq 24.0.1 py38hfc09fa9_1
326c326
<     - rdkit 2022.03.5 py38ha829ea6_0
---
>     - rdkit 2022.09.1 py38h5acb366_1
330,331c330,331
<     - scikit-learn 1.1.2 py38h0b08f9b_0
<     - scipy 1.9.1 py38hea3f02b_0
---
>     - scikit-learn 1.1.3 py38h4c4ba11_1
>     - scipy 1.9.3 py38h8ce737c_1
342c342
<     - sqlalchemy 1.4.42 py38h0a891b7_0
---
>     - sqlalchemy 1.4.42 py38h0a891b7_1
346,348c346,348
<     - tbb 2021.6.0 h924138e_0
<     - tbb-devel 2021.6.0 h924138e_0
<     - terminado 0.16.0 pyh41d4057_0
---
>     - tbb 2021.6.0 h924138e_1
>     - tbb-devel 2021.6.0 h924138e_1
>     - terminado 0.17.0 pyh41d4057_0
351c351
<     - tinycss2 1.1.1 pyhd8ed1ab_0
---
>     - tinycss2 1.2.1 pyhd8ed1ab_0
354c354
<     - tornado 6.2 py38h0a891b7_0
---
>     - tornado 6.2 py38h0a891b7_1
356c356
<     - traitlets 5.4.0 pyhd8ed1ab_0
---
>     - traitlets 5.5.0 pyhd8ed1ab_0
359c359
<     - unicodedata2 14.0.0 py38h0a891b7_1
---
>     - unicodedata2 14.0.0 py38h0a891b7_2
361c361
<     - watchdog 2.1.9 py38h578d9bd_0
---
>     - watchdog 2.1.9 py38h578d9bd_1
387c387
<     - zipp 3.9.0 pyhd8ed1ab_0
---
>     - zipp 3.10.0 pyhd8ed1ab_0
391c391
<     - boost >=1.74.0,<1.75.0a0
---
>     - boost >=1.78.0,<1.79.0a0

The major change is that boost has gone from version 1.74.0 to version 1.78.0. Looking at the recipe for ambertools:

ambertools 22.0 py38hf8a91bc_3 conda-forge

  run:
    - _openmp_mutex >=4.5
    - arpack >=3.7.0,<3.7.1.0a0
    - boost-cpp >=1.74.0,<1.74.1.0a0
    - bzip2 >=1.0.8,<2.0a0
...

it is evident that ambetools is pinned against a very strict (probably unnecessarily so) version of boost. As a fall back we are installing a crazy old version of ambertools, which (presumably) doesn't require boost, or has incorrectly defined the requirement range.

The problem would be resolved if the ambertools maintainers simply rebuilt their package against the latest version of boost. Alternatively, re-adding as a dependency of BioSImSpace (however we do this) would also solve the issue.

@lohedges
Copy link
Member Author

lohedges commented Nov 1, 2022

It looks like ambertools on conda-forge has a bunch of pending updates. Presumably this issue will be solved when they get around to rolling out those. I am guessing that there is no way that we can't include ambertools as a dependency if we don't want to have more issues like this in future. (It's annoying that packages aren't automatically rebuilt when their dependencies update.)

I really don't want to keep adding/removing or changing dependencies as a temporary workaround as this will cause environment resolution issues going forward, e.g. when the one with fewer downloads/changes may be preferred.

@lohedges
Copy link
Member Author

lohedges commented Nov 2, 2022

I've now resolved this by adding ambertools as a host requirement in the Sire conda recipe. This means that Sire is built in an environment in which the package is already installed, so dependency resolution is correct. Since the package isn't an explicit runtime requirement for Sire/BioSimSpace, this means that users can create environments with or without ambertools installed.

This approach seems to work well and could be expand for any other problematic packages, e.g. common packages that users might want to have in their environment, but which aren't needed by BioSimSpace. We could even provide a method for users to build their own Sire conda package with a pre-specified environment, so that they can guarantee that it will work correctly.

@lohedges lohedges closed this as completed Nov 2, 2022
@mattwthompson
Copy link
Collaborator

Feel free to reach out to myself or @j-wags if you encounter deployment issues with our stack in the future. I can't guarantee we'll be able to change things, but we're also generally interested in lowering the friction with installing our tools.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants