Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate long-term migration for error_overlinking: true? #2015

Open
h-vetinari opened this issue Oct 6, 2023 · 9 comments
Open

Separate long-term migration for error_overlinking: true? #2015

h-vetinari opened this issue Oct 6, 2023 · 9 comments

Comments

@h-vetinari
Copy link
Member

In the context of the boost migration, we ran into some issues and then added a defensive run-export from libboost-python-devel to libboost (rather than just libboost-python).

This is not the cleanest way to do it, and Isuru suggested adding error_overlinking in the migrator. I don't want to couple this with the boost migration though, because I spent a lot of time looking at the CI of basically all the boost migration PRs, to check whether packages actually needed the boost libraries, or just the headers (or vice versa).

And it's pretty clear that many feedstocks do not care much about their link check, as long as the CI is green (i.e. probably >90% of the PRs I inspected had some linkage issues flagged).

Still, while very painful, I think it'd be a worthwhile clean-up to do in conda-forge as a whole. So I thought I'd open this issue so that we can do this in a way where we don't single out one specific migration to bear the brunt of that pain, and can move from feedstock to feedstock at a more leisurely pace.

However, this would need a couple of things to figure out IMO, at least where some packages show up in the link check in a way that's not the fault of the recipe, or despite being actually needed:

  • compiler-runtimes on windows: ucrt / vc14_runtime are added defensively, even though they're not always used
  • same for libcxx for C++ on osx
  • numpy has a linkage mechanism that apparently(?) isn't understood by the link check
  • python itself often shows up
  • etc.

We might be able to improve the first two by using {{ stdlib("c") }} and perhaps even {{ stdlib("cxx") }} (once it lands...) as explicit dependencies in recipes, rather than always adding it in the compiler, but in general, I think we should have answers (or exceptions) for these cases, so we don't force all feedstocks to pointlessly add ignore_run_exports_from: everywhere.

Thoughts?

@mfansler
Copy link
Member

(I wrote an extended response to this a week ago and it got deleted in a browsing mishap. 😮‍💨 Trying again...)

I really like this idea, and it has been something I've been cleaning up for the R feedstocks since I joined the team. Here are a few aspects from the R feedstocks ecosystem:

  • The proposed repo-wide migration will raise errors on all compiled R Windows builds. We already see this with staged-recipes submissions of compiled R packages, which fail to find R.dll, Rblas.dll and Rlapack.dll. Submitters rarely grok how to deal with this, so it eats reviewer time. Isuru has suggested a workaround which would fixed the situation on incoming recipes (Add R.dll to missing_dso_whitelist on windows conda/conda-build#4786). It would be best if a migration would proactively deal with this.
  • We should plan to patch repodata. We have few superfluous specification issues; rather we are most prone to undeclared linking. This is because R itself brings in many libraries, so feedstocks get away with linking without declaring. These often crop up after library migrations, when end users hit load errors. Setting linking errors repo-wide will surface all of these (great!!), however, a real clean up should also patch previous build metadata. Personally, I have a backlog of feedstock Issues for this, and haven't really worked out how to identify all the constraints. Is there specific tooling to help with this?
  • Communications upstream would be nice. The place in R packages where we see users over-specifying links is when upstream authors copy-paste OpenMP, BLAS, and LAPACK flags into their CPPFLAGS. These are often unnecessary, but diligent recipe authors will see it and declare the dependencies. I think it would be a useful contribution for the maintainers to open upstream issues when they see this, to let package authors know they are including unnecessary flags.

@mfansler
Copy link
Member

@conda-forge/r please chime in if I missed anything or you have a different take.

@h-vetinari
Copy link
Member Author

Note: this is being set for new feedstocks since conda-forge/staged-recipes#21940. But we still have an overwhelming majority of existing feedstocks that don't build with this.

@jakirkham
Copy link
Member

It's worth noting that python and numpy generate a bit of noise atm ( conda/conda-build#3325 )

@jakirkham
Copy link
Member

Also noticed the conda-build docs are a bit spotty

Wrote up some suggestions on areas to fill out in issue: conda/conda-build#5103

@h-vetinari
Copy link
Member Author

It's worth noting that python and numpy generate a bit of noise atm ( conda/conda-build#3325 )

Both of these would only trigger on error_overdepending: true, wheras error_overlinking tackles the case that a produced binary relies on a given shared library being present (which might be the case, almost always as a transitive dependency), but without having a declared host-dependence on it (and hence we're not tracking the potential ABI-impacts of that dependency).

@hmaarrfk
Copy link
Contributor

there are also a few (new) warnings about multiple packages depending on other. would those trigger failures too?

I agree with making this a separate long term migration. seems like a good idea.

@hmaarrfk
Copy link
Contributor

I will cross reference @pkgw's comment in #1880 (comment)

Ultimately, as long as we are compiling C/C++ code, the development files for libraries will have to depend on the development files for their dependents. (And I think that you can argue that this is really the essence of what it means to be a "library" regardless of language/OS/whatever, but that's beyond the scope here ...) The real issue here, in my view, is that there are still many cases where we don't (or don't properly) separate libraries and development files into different subpackages. You can't get rid of the dependency graph, but you can at least do a better job of partitioning it into runtime/dev components. Unfortunately, Conda didn't include this distinction when it was first developed, so all of this separation has to be applied retroactively. (Which is a bit frustrating as Linux distros have been getting this right for 30 years, but there is also a reason that Conda has achieved the level of success that it has ...)

So it might be hard to get this perfectly right for all packages.

@h-vetinari
Copy link
Member Author

So it might be hard to get this perfectly right for all packages.

That's a mostly orthogonal concern here, I think. If there's no runtime/dev split, then the run-export is just the whole thing, which means downstream packages will still be able to build against a library and its dependencies.

Having a proper split into -devel packages would be advantageous for many reasons, but isn't necessary (or a hindrance) for ensuring we correctly capture the ABI exposure of any given artefact (in terms of the provenance of symbols being linked to), which is what error_overlinking would do (modulo static libraries, which aren't covered by lief).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants