Skip to content
This repository has been archived by the owner on Aug 2, 2020. It is now read-only.

sdist-ghc is a bit too indiscriminant in copying sources #384

Open
bgamari opened this issue Jul 28, 2017 · 6 comments
Open

sdist-ghc is a bit too indiscriminant in copying sources #384

bgamari opened this issue Jul 28, 2017 · 6 comments

Comments

@bgamari
Copy link
Collaborator

bgamari commented Jul 28, 2017

Hmm, one thing I noticed is that if you have build artifacts in your tree (e.g. from the make build system), hadrian will happily copy these into the source distribution. e.g. a test in a dirty tree that I had laying around gave me this,

/------------------------------------------------------------------------------------------------------------------\
| Copy file                                                                                                        |
|      input: libraries/containers/dist-install/build/Data/Map/Lazy/Merge.dyn_o                                    |
|  => output: sdistprep/ghc/ghc-8.3.20170721-src/libraries/containers/dist-install/build/Data/Map/Lazy/Merge.dyn_o |
\------------------------------------------------------------------------------------------------------------------/

Ideally we would only take precisely the files that are needed (and never binaries).

@snowleopard
Copy link
Owner

Aha, left-over object files could perhaps be the reason for you seeing something being built by sdist-ghc. Copying might have introduced tracking, which in turn started to bring the files up to date.

@bgamari
Copy link
Collaborator Author

bgamari commented Jul 29, 2017

Sounds like it.

@snowleopard
Copy link
Owner

I've been trying to figure out what the Make build system does here, and it appears that it actually cleans the tree first and then simply copies all files. So, Make will also happily copy any files left over after the cleaning.

So, I see two approaches:

  • We do what the Make does: we run the clean build rule first and then run sdist, as it is now. We can also hardcode the exclusion of dist and dist-install subdirectories produced by Make, just in case.
  • We do The Right Thing by copying exactly the source files for each package, as specified in the .cabal files. This is harder to implement and I'm sure there will also be plenty of files that should be copied but which are not actually listed in .cabal files, so this might require fixes in the GHC.

I've implemented the first approach in the above commit.

@bgamari Could you give it a try? Do you think this is a sufficient solution for now?

@snowleopard
Copy link
Owner

@alpmestan @angerman You commented out cleanSourceTree here:

https://github.com/snowleopard/hadrian/blob/master/src/Rules/SourceDist.hs#L15

Are you sure we no longer need to clean the source tree before creating a source distribution? If yes, I'll simply remove this function (and close this issue).

@alpmestan
Copy link
Collaborator

I'm not 100% sure. In particular, we probably want to exclude the build root, in case it's somewhere in ghc's source tree (and I quite often do that, I often run hadrian from the root of ghc's source tree, doing something like hadrian [...] --build-root=_something, ending up with a bunch of _-prefixed directories at the top of GHC's tree, with different (by the flavour or some other criterion) working builds sitting next to each other there. We obviously don't want to embed those, but while we have access to the build root used for the current hadrian command, we don't have access to past ones, so the blacklisting going on here becomes a bit of a problem.

The "obvious" solution is to instead whitelist top-level files/dirs, hoping that ghc's toplevel structure does not change too often... @snowleopard Maybe I'm overlooking something?

@snowleopard
Copy link
Owner

snowleopard commented Apr 3, 2018

@alpmestan I'd say that until we are 100% sure, we should keep invoking something like cleanSourceTree. It doesn't cost much but can save maintainers some time. Hence I brought it back.

Perhaps the current implementation of cleanSourceTree can be improved? Quite likely! I don't have a strong opinion on how it should be implemented, but we can keep this issue open to discuss further. Perhaps @bgamari, who opened this issue, could provide some input.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants