-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[conda] Unable to make a conda build #113
Comments
Is it mandatory to support conda? If so, maybe we can switch to another pdf-reader lib. |
Not mandatory but this is a very common installation mean for python package. We might investigate other options to replace the dependency but we'll have to check for performance drop first |
For reference, my initial issue on PyMuPDF (pymupdf/PyMuPDF#938) was moved to this discussion: pymupdf/PyMuPDF#1137 |
Why not simply do a conda run pip install <missing in conda package> Especially since there's is only one package |
Hi @kchawla-pi, So actually since then, there is also Also please note that for now, the only important dependencies that would benefit from a conda support (performance-wise) are PyTorch & TensorFlow 👍 Anyway, we'll provide some updates on this very topic soon! |
So now with #829 we are just missing |
Nope, pypdfium2 also lacks support of a conda installation. But that could be fixed, I'll ping them about this! And I think we should seriously consider that: especially for HTML, it's more about people in need of training data, so I would argue that most users don't benefit from weasyprint (which is a problem for MAC users also #815) For PDFs, it's more important, so if we can get a conda build, our best course of action would probably be to move html/weasyprint to an extra! What do you think? |
I just checked and weasyprint does have a conda build now 🙌 (But I still think we should move it to extra builds) |
Sorry about the conda build - I never used conda myself and currently don't have the time/interest to learn it. Due to platform-specific binaries, the setup infrastructure of |
@mara004 what do you mean by "any reason you can't use pip?" pip installation is already available 👍 |
I'm not familiar with the conda environment, so perhaps that was a silly question to ask.
I'd be curious to know in what way exactly conda builds are more specific? I have read the comparison of conda to pip in Wikipedia, but the problem specified there can be solved with venv. pip allows dependency breakage, but very clearly warns about it, so I don't really see an issue in this regard... |
Well, pip does not do sophisticated dependency resolution, unlike Conda. It's the same reason pipenv and poetry are used for package installations, but unlike Conda, they use PyPI's index. Each of these has their own algorithm for dependency resolution, with Pipenv being rather slow. Conda is the defacto tool for data scientists in the Python ecosystem. Seamlessly using Mindee packages using Conda will solve a big paper cut. |
Okay, thanks for pointing this out! |
I can definitely second @kchawla-pi on that: I always try to find a conda installation before using pip, because it's much more careful about your existing env compatibility 👍 |
I tried to craft a package with |
Wow that must be so frustrating . I don't know about Conda packaging, but now I'm pissed at conda for making your job so difficult. I will try to take a gander at it in June. |
Well, I don't know, perhaps I was just doing it the wrong way, but all the same it hasn't been very obvious to me how to do it. |
In my experience conda build is always a long operation. Base conda is known to have a slow dep resolution procedure, so I personally use mamba (https://github.com/mamba-org/mamba) which is blazing fast for dep installation (multi-thread, rewritten in C++). I have to check if that extends to package building as well |
I think the main problem is that, when running conda-build, it creates an isolated environment where all dependencies are installed. Now, if we want to craft more than one package, it would be essential that the environment can be reused so that dependencies don't need to be installed each time. Is there any option to do this? |
@frgfm do you know an answer ? 😅 |
Even if we can get around the duration problem, I'll still need information about conda platform tags. We need an equivalent for each of the tags shown on https://pypi.org/project/pypdfium2/#files (section "Built Distributions"). |
For reference, these two pages sound interesting: |
Since it looks like the packages generated from pypdfium2-feedstock will not be made public (cf. AnacondaRecipes/pypdfium2-feedstock#1 (comment)), I will make a second attempt at building official conda packages for pypdfium2 in a conda branch, trying to accept or work around the python version problem (it remains to be decided how). pypdfium2-feedstock currently requires manual interaction and native hosts.1 I want to design this differently so we can build automatically in a workflow and without native hosts. Footnotes
|
Ok, so I think I have the local packaging part ready. It's really inelegant, but all I could do given conda's limitations. Now the remaining parts we need are
Here's an archive of builds for python 3.11 which I generated locally: pypdfium2_conda_py311.zip Note that the packages will contain wrong (PS: @kchawla-pi, now you can take a look at the code if you like ;) ) |
Oh, and I just discovered conda's |
Hi @mara004 About uploading this seems not to be so complicated: https://levelup.gitconnected.com/publishing-your-python-package-on-conda-and-conda-forge-309a405740cf (manual upload) and with CI: (as example from @frgfm 's holocron lib) 😅 Maybe @frgfm can help a bit more :) |
Thanks, sorry for spamming this thread. The performance difference is heavy, though. Building all wheels takes ~20s on my device. Contrast this to conda builds which take, like, over 15min.1 I've got a feeling I'm missing something here, but if that were true it's not obvious how to do it properly. Footnotes
|
Throwback, @boldorider4 just gave me an eye opener that pdfium should be packaged separately in conda so pypdfium2 can just depend on it and cleanly be noarch. I'm still thinking about this but believe it may finally be the clean solution I was looking for. Ideally the conda packaging would be done in pdfium-binaries (will still need some conda convert for the cross compiled archs, but much easier). Then what we need in pypdfium2 is to instruct the library loader with the right path, and of course a noarch conda recipe. This should really have come to my mind earlier. Especially I should have realized after a recent discussion with @KOLANICH about pdfbox, just failed to connect it. Phew, I need a break before revisiting this 😅 |
@mara004 fyi there is also a draft for |
Thanks for the pointer, see my comment conda-forge/staged-recipes#23726 (comment). |
@mara004 I wanted to ask if there are any updates on your site ? :) |
I've got it on my mind and have been working on some integration prerequisites to get this done nicely - packaging with an external library differs quite a bit from bundling. I can elaborate on the individual tasks if necessary. |
Oh yeah, no stress, I just wanted to ask so I can plan for it. :) |
Work in progress: https://github.com/pypdfium2-team/pypdfium2/pull/268/files |
However, we might have a bit of a problem with the custom channels. In that case, users would have to add the channels explicitly before installation, which is probably doable, but not nice. |
Just merged the conda packaging code: pypdfium2-team/pypdfium2@ee5a2ff. |
And the CI/docs also merged now: pypdfium2-team/pypdfium2#269 |
Thanks a lot @mara004 👍🏼 |
Thanks @mara004, it looks like we're gonna be able to get docTR on anaconda now! Default channels
Conda forge
Custom channels Just gotta put together a conda recipe. Building it might be long for the dependency resolution + check considering the amount of deps, but it should work 👍 (we might have some surprises on some OS though) |
I imagine carrying around the custom channels for pypdfium2 (pypdfium2-team, bblanchon) might be a bit of an annoyance... Despite the I'm kind of wondering if we might have gone the wrong way and should have tried putting pypdfium2 and dependencies in conda-forge instead, but the feedstock publishing seemed less flexible and I wasn't sure how to automate it. However, if anyone wants to pursue that path, the feedstocks written by Anaconda Team (pdfium-binaries, ctypesgen [pypdfium2-team fork]) might be a good starting point. Though I'd recommend not to use their pypdfium2-feedstock, but split in separate pypdfium2_raw and pypdfium2_helpers packages as we do in the custom channel. It would be most convenient if conda-forge as a community channel could just "include" or mirror the pypdfium2-team and bblanchon channels, but I don't think they can do this. Anyway, unfortunately my time budget for conda is more than over, so I won't be able to look into this any deeper 😅 |
FWIW, someone has put pypdfium2 in conda-forge now, but badly. So, please continue using our packages from conda-forge links: |
Thanks for the update 👍 |
Unfortunately, one of the project dependencies does not have any conda release or any way to make one. I opened an issue on their repo pymupdf/PyMuPDF#938 to track this, but so far I haven't found any way to release the project on anaconda with this dependency.
The text was updated successfully, but these errors were encountered: