Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Force recompilation of all dependencies #195

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kmod
Copy link
Contributor

@kmod kmod commented May 5, 2022

There is, in general, not a requirement that source packages compile to the same thing as available binary packages. It's rare, but this is one thing that can end up causing differences in benchmark results between two Python distributions, if one has binary packages and one does not. Forcing all dependencies to compile from source eliminates this potential difference.

The only case I know of is mypy, which ships a more-optimized binary package than what their source package compiles to.

@corona10
Copy link
Member

corona10 commented May 5, 2022

@kmod
IIUC, the most user doesn't use --no-binary option while they installing packages,
Doesn't it show a benchmark that deviates from the general use case?

@kmod
Copy link
Contributor Author

kmod commented May 5, 2022

True, that's not how users install packages, so the package installation process would be a bit different between the benchmarks and users. But in the common case the actual running of the benchmarks shouldn't be affected, and in the rare case that the benchmarks are affected by whether or not binary packages are installed I'd argue that that's when we would want this behavior.

@corona10 corona10 closed this May 7, 2022
@corona10 corona10 reopened this May 7, 2022
@ericsnowcurrently
Copy link
Member

Great observation!

tl;dr +1, but add an option to disable --no-binary.

The kinds of benchmarks where dependencies matter are typically workload-oriented (macro benchmarks). We want to be able to communicate a performance delta to users relative to how they should expect their Python workload to perform. So with that in mind I'd lean toward avoiding --no-binary if possible. Performance difference between binary and source distributions should be communicated by the relevant projects (e.g. mypy).

That said, I get what you are saying about apples-to-apples comparisons for Python implementations (or distros). We also run into this with running the benchmarks against CPython main: few dependencies are available on PyPI. So I agree we don't want benchmark dependencies to dictate any difference in performance between result sets.

Weighing the two sides, I'm in favor of the latter, which you are proposing. So +1. There is still a use for using pre-built extensions, though, so it would be good for pyperformance to allow us to opt out of --no-binary.

@ericsnowcurrently
Copy link
Member

Also, presumably all the dependencies build on all our tested versions, based on the CI results.

There is, in general, not a requirement that source packages compile to the same thing as available binary packages. It's rare, but this is one thing that can end up causing differences in benchmark results between two Python distributions, if one has binary packages and one does not. Forcing all dependencies to compile from source eliminates this potential difference.

The only case I know of is mypy, which ships a more-optimized binary package than what their source package compiles to.
@kmod
Copy link
Contributor Author

kmod commented May 10, 2022

No problem, I added a USE_BINARY_PACKAGES env var that will keep the old behavior of not passing --no-binary. Personally I have a hard time envisioning a scenario when a user would want to include performance differences between binary packages... but anyway I think an argument for the behavior of this PR being opt-out (rather than opt-in) is that people who want binary packages probably know that they do, and people who want more consistency are probably not aware of this as something to worry about.

Anyway, let me know if there's anything else I can do to make this more acceptable. I'm not sure what to do about the test failures -- in my view these are legitimate issues (that these interpreters are unable to build these dependencies) that are being uncovered by this PR. I think we could add wheel as an automatic dependency to get pypy to work, but I'm not sure what the issue is with 3.11 and greenlet.

Copy link
Member

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants