Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python version in docker: circle-plots.py #323

Open
ashotmarg opened this issue Oct 18, 2023 · 25 comments
Open

Python version in docker: circle-plots.py #323

ashotmarg opened this issue Oct 18, 2023 · 25 comments

Comments

@ashotmarg
Copy link

Hi @psathyrella, I am getting a python issue when using the --plotdir in the docker container.
Namely: circle-plots.py requires python3 (#!/usr/bin/env python3) but the docker has only python2.7. Do you think the circle-plots.py will work if I change the python version from 3 to 2.7 in the circle-plots.py script file?
Thanks,
Ashot

@psathyrella
Copy link
Owner

Whoops, sorry about that. It's possible that would work, although also possible circlify required python3. If that doesn't work, we should add python3 to the docker container, probably with apt install python3-dev. Let me know what happens.

@ashotmarg
Copy link
Author

Thanks for the prompt reply! Yeah, I will try with 2.7 and let you know. But I guess in general it would be good to move to python3, i.e., change the anaconda base image, but not sure if the rest of the python2.7-dependent scripts will complain :)

@psathyrella
Copy link
Owner

Coincidentally, I think I'm finally starting the 2 to 3 conversion this week, and oh my goodness there will be complaining by probably mostly from me.

@ashotmarg
Copy link
Author

Hope things go smoothly with the transition :)

@ashotmarg
Copy link
Author

Hi again @psathyrella, apart from the python version issue it doesn't seem that the circlify (which is required for the circle-plots.py) is installed in the docker. When I tried to install it with a very old version of circlify==0.9.1, which should be compatible with python2.7 I got some versioning errors.
Given that you are switching to the python3, I don't think it makes sense to troubleshoot this, and it's probably easier for the rest of us (apart from you :) ) to just switch to that new version, right?

@psathyrella
Copy link
Owner

Yeah that sounds good, I'll just have a look at this to make sure it's fixed after everything is on 3.

@ashotmarg
Copy link
Author

Hi @psathyrella, I was wondering if you have any updates re the python 3 version of partis.

@psathyrella
Copy link
Owner

Part way done, you can view progress here. It's running all the tests fine (i.e. basics are ok), but I still have some more checks to do.

@ashotmarg
Copy link
Author

Hi @psathyrella , I can see from the progress that you're still working on the new version. Just wondering if you have estimates when you will finish it? The very general command I want to run might even be ok with the current version, but I need to run it in the docker and I guess you will make it when you've finished everything, right ?

@psathyrella
Copy link
Owner

Whoops, was thinking of pinging you. I'm buried in a couple other things so I won't be able to finish the extra 2 to 3 checks I need to do til later in February, but yeah the current version should be fine for you. I just merged 2to3 to main, but unfortunately the docker file needs some modifications to work with python 3, and my first few tries aren't working (i can't run docker on my machine atm, i need to free up some space). On the off chance you're familiar with docker and you could suggest what I need to change here I can easily make the changes and push, otherwise it'll probably take me a few days (i've just swapped =2.7 for =3.10, which didn't work): https://github.com/psathyrella/partis/blob/main/Dockerfile

@ashotmarg
Copy link
Author

Sounds good, hope those extra few things won't create much headache. With my limited docker experience I managed to have a working (hopefully) version of the dockerfile.
It took some fight with docker/partis, mostly due to my (i) company firewall with private SSL certificate issues and (ii) importantly the arm64 architecture on my mac. In the end I used another machine. In any case this worked on my x86_64 Mac OS Darwin.
I had to add the .txt, to the Dockerfile's filename to attach it here, hope this is useful.
Dockerfile.txt

@psathyrella
Copy link
Owner

Oh my goodness, wow, thanks! Wow, you certainly had to make a lot of modifications. It'll unfortunately I think still take me a bit to figure out which of those modifications are specific to your environment vs applicable to the github/quay build, but it seems like if you've got that built, you can then use the resulting local docker image to run what you need?

@ashotmarg
Copy link
Author

Yep, I haven't actually fully tested it, only saw that Partis run from the docker container itself, which was a good start :) So, assuming that (a) the docker tests run smoothly and (b) your last couple of remaining checks don't create issues, then I am good :) Thanks for the python 2 -> 3 transition.

@psathyrella
Copy link
Owner

Great. I should get to the docker file pretty soon, but it might take me a week or two to wrap up a couple other things first.

@ashotmarg
Copy link
Author

Hi @psathyrella , a quick update from my end.

When I run a basic partition command on the example dataset (that you provided) I get an error. This singularity file is based on the new docker that I made.
singularity exec ../code/partis.sif /partis/bin/partis partition --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq

...
File "/partis/python/utils.py", line 7130, in write_yaml_output
version_info = {'partis-yaml' : 0.1, 'partis-git' : '' if dont_write_git_info else get_version_info()}
File "/partis/python/utils.py", line 7030, in get_version_info
vinfo['commit'] = subprocess.check_output(['git', '--git-dir', git_dir, 'rev-parse', 'HEAD'], universal_newlines=True).strip()
File "/opt/conda/lib/python3.10/subprocess.py", line 421, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/opt/conda/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/opt/conda/lib/python3.10/subprocess.py", line 971, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "/opt/conda/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'git'

However, when I run the same command with the --dont-write-git-info flag it runs fine.
singularity exec ../code/partis.sif /partis/bin/partis partition --dont-write-git-info --infname ../code/partis/test/example.fa --outfname outExample.yaml --parameter-dir outExampleDir --get-selection-metrics --extra-annotation-columns consensus_seq
I am guessing since --dont-write-git-info does the trick you probably know where the issue is coming from, which is why I didn't paste the whole error trace.

@psathyrella
Copy link
Owner

Looks like git isn't installed in the singularity container, but yeah, as you discovered git isn't very necessary. It's just how it writes the partis version to the output file, which is important for reproducibility but not at all necessary.

@ashotmarg
Copy link
Author

Great, I will have a look at the docker soon. Since I mostly used your docker install commands, it might be our base images that are different in terms of whether git is already installed or not. In any case, the important part is that it's not affecting the outputs :)

@ashotmarg
Copy link
Author

Hi @psathyrella , when running our data with new (python3) and old (python2) Partis docker containers, we get somewhat different clustering results. We are using partis partition command with the same --random-seed value. Would you expect this outcome even if one uses the same --random-seed ? If not, then we'll try again based on a fasta file that we are allowed to share for troubleshooting, and add the details here for reference.

@psathyrella
Copy link
Owner

Yes, it's expected to get different results with different versions. Making even relatively minor code changes can alter random number generator state, and python 2 to 3 made tons of huge code changes. The most common source of the changes are that each clustering iteration apportions the sequences/clusters randomly among processes, which of course depends on random generator state.

They shouldn't be incompatibly different, but rather shuffled around within the clustering uncertainty. So if you look at the progression of partitions in the output (i.e. as hierarchical agglomeration proceeds) the difference between versions would typically correspond roughly to moving up or down a partition or two, depending on the log likelihood separating them (this is the same difference you'd expect from changing the random seed). This doesn't mean, though, that they'll have similar cluster sizes -- one uncertain merge can make a huge difference to cluster size, for instance.

@ashotmarg
Copy link
Author

Thanks for the details Duncan, I was naively assuming that giving the same seed might work even with different python versions, but as long as this behaviour is expected in compatible way, then all good :)

@psathyrella
Copy link
Owner

ok I think I've got the docker file working with the python 3 updates https://quay.io/repository/matsengrp/partis?tab=tags

@ashotmarg
Copy link
Author

Great, thanks for the update, are you by any chance also planning to create a biocontainers package along with the docker?

@psathyrella
Copy link
Owner

Sure, could do. After reading through their docs I'm a little unclear what I'd do beyond adding some comment and label lines to the docker file -- do the biocontainers people add it to the registry after we make a github issue, or do I add it?

@ashotmarg
Copy link
Author

Yeah, it is also confusing to me :). I guess the idea would be that partis could also be run from BioConda, for which one needs to build the conda package with a recipe (similar to the docker). If you want to stick with only docker (no conda) then I guess it would still be good to put it in biocontainers, in which case, it automatically should also be converted to a singularity image in the Galaxyproject as well: https://github.com/BioContainers/singularity-build-bot.

@ashotmarg
Copy link
Author

Re the docker, yeah, I think you need to open a github issue and take it from there: https://biocontainers-edu.readthedocs.io/en/master/contributing.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants