Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of external dependencies #2859

Merged
merged 9 commits into from
Aug 15, 2024
Merged

Improve handling of external dependencies #2859

merged 9 commits into from
Aug 15, 2024

Conversation

daljit46
Copy link
Member

@daljit46 daljit46 commented Mar 12, 2024

Aims to address #2584. Eigen and Json for Modern C++ are now fetched using CMake's FetchContent. Nifti headers are moved to a new thirdparty/nifti directory.

Seems to work fine locally, but there is a small downside: the configure log is quite a bit more verbose than before because of Eigen (at least the first it is downloaded). Not sure if there is a way to improve on that.

@daljit46 daljit46 added the build label Mar 12, 2024
@daljit46 daljit46 self-assigned this Mar 12, 2024
@daljit46 daljit46 marked this pull request as draft March 12, 2024 13:30
@daljit46
Copy link
Member Author

On the configuration issue, unfortunately Eigen 3.4.0 doesn't make use of CMake's best practices so it's wrongly assuming that it's being built as a top level project. This has been fixed in a subsequent commit.
Perphaps we could add a small patch based on that commit to make this work, but not sure.

Furthermore, CI tests seem to be failing probably because we are using CMake 3.16, but I'm hoping that we can tweak things to build succesfully (hopefully without hitting a dealbreaker).

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46
Copy link
Member Author

daljit46 commented Mar 13, 2024

Ok, so I've patched eigen removing unncessary configuration messages. This seems to work as expected.
I have one extra thought that I'd like to get some feedback on. By default FetchContent downloads dependencies in the build directory. It's quite common for developers to start a clean build. There are a few ways to achieve this with CMake:

cmake -B build --fresh # It reconfigures the project deleting any CMakeCache.txt
cmake --build build --target clean # cleans any build artifacts
cmake --build build --clean-first # Same as previous line but triggers a build

From our discussions, it seems that a rm -rf build seems to be preferred amongst us to the options above(?). However, if we delete the build direcory then FetchContent will be forced to refetch the dependencies from the internet, elongating the configuration time.
One way to get around this is to direct FetchContent to download the dependencies to a separate directory (e.g. _dependencies) and add this to .gitignore. This would allow devs to delete the build directory without having to download the dependencies again.

@MRtrix3/mrtrix3-devs any thoughts?

Copy link

clang-tidy review says "All clean, LGTM! 👍"

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46
Copy link
Member Author

daljit46 commented Apr 3, 2024

Another relevant issue is that if we decide to pull third party source code from a server, our Github tarballs no longer will contain the full source code required to compile once downloaded, but only after the CMake configuration phase. This can be an issue for certain usecases like packaging workflows by Linux distros.

To avoid this, @jdtournier suggested that we should instead look into git subtrees, which would effectively integrate thirdparty dependencies into our source code (the third party code will effectively be part of our repo instead of being just a link like in the case of git submodules).
This gets around the issue just mentioned above, however there are some things that I think could be problematic if we use this approach. Most importantly, because git subtree incorporates third party code into our repo (including their histories), the use of git subtrees will inevitably result in a significant increase of our repo's size. For example, the repo for eigen weighs almost ~100MB when compressed (uncompressed size is about ~130MB). nlohmann::json's repo also exceeds 100MB.
This seems a little excessive to me since all we really need is just a header file.
Of course, FetchContent will require the download of third party repos too, but it would only do so at configure time thus effectively having no effect on the size of the repo.

@MRtrix3/mrtrix3-devs I'm not really sure how to proceed here, but let me know if anyone has any thoughts.

@Lestropie
Copy link
Member

The other thought that came up during last meeting (wrote in minutes, didn't get to here yet) was to give cmake the ability to configure external dependencies for the build process in one of two ways:

  1. If a relevant environment variable is specified at configure time, providing a filesystem path to a location where that external dependency can be found, then it will pull the source code from that external location into the build directory.
    This would allow offline compilation, provided that the mechanism used to obtain source code prior to attempting offline compilation performs the requisite acquisition of the source code for those external dependencies. This could be eg.:

    1. Downloading and embedding the specified versions of those external dependencies within our own GitHub tarballs;
    2. Someone like a distro maintainer expressing in their MRtrix3 package the fact that additional dependencies need to be pulled and the MRtrix3 cmake needs to be instructed where to find them.
  2. If that environment variable is not specified, then something like FetchContent will instead be used.

@daljit46
Copy link
Member Author

I have now rebased this and made some changes to address the issues above:

  • Developers can now specify an optional local folder containing that points to the sources of eigen, json for modern c++ and half. This can be useful in offline environments without network access.
  • Dependency handling code is moved into a separate CMake module.
  • Half is directly fetched from sourceforge.net
  • FetchContent now downloads source repos in PROJECT_SOURCE_DIR/_deps to avoid downloading repos again for separate build directories.

@MRtrix3/mrtrix3-devs Feedback welcome.

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46
Copy link
Member Author

Made some more changes. Now we only download release artefacts for Eigen instead of cloning the Gitlab repo (which is much faster). Additionally, I got rid of the custom patch to silence Eigen's configuration output and instead create a CMake INTERFACE library manually. Finally, the base location of FetchContent has been globally set to PROJECT_SOURCE_DIR/_fetchcontent.

@daljit46 daljit46 marked this pull request as ready for review July 12, 2024 10:46
Copy link

clang-tidy review says "All clean, LGTM! 👍"

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46
Copy link
Member Author

daljit46 commented Jul 25, 2024

As we discussed in the meeting yesterday, I have retracted the change to avoid pull in dependencies in a custom source folder inside the source directory. So dependencies will be downloaded in the build directory. This shouldn't be an issue because download times should be quite short.
@jdtournier @Lestropie unless you have any objections, I will enable auto-merge on this PR.

Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46 daljit46 enabled auto-merge July 25, 2024 12:21
- Json for Modern C++ and Eigen3 are fetched using CMake's FetchContent

- Nifti headers are moved to thirdparty/nifti
To remove unnecessary verbose steps we patch Eigen's CMakeLists.txt
- Developers can now specify an optional local folder containing that
points to the sources of eigen, json for modern c++ and half. This can
be useful in offline environments without network access.

- Dependency handling code is moved into a separate CMake module.

- Half is directly fetched from sourceforge.net

- FetchContent now downloads source repos in PROJECT_SOURCE_DIR/_deps to
avoid downloading repos again for separate build directories.
FETCHCONTENT_BASE_DIR is now set globally for the project to `_fetchcontent` directory
inside the top level directory.x
Copy link

clang-tidy review says "All clean, LGTM! 👍"

Eigen's configuration script is really verbose and we don't need its
output when configuring MRtrix3 as it's just a header only library.
This also allows to get rid of the custom patch to fix this problem.
Copy link

clang-tidy review says "All clean, LGTM! 👍"

@daljit46
Copy link
Member Author

@MRtrix3/mrtrix3-devs unless anyone has objections, I will now merge this.

@daljit46 daljit46 merged commit d533879 into dev Aug 15, 2024
6 checks passed
@daljit46 daljit46 deleted the thirdparty branch August 15, 2024 14:22
@Lestropie
Copy link
Member

  • It would have been preferable for df08ea1 to be split into two commits, with removal of the third-party files authored by @MRtrixBot, for the sake of contribution statistics tracking. It's arguably not as consequential now that the commit count is over 10,000 and GitHub no longer shows line changes per contributor, but it would still be preferable to keep those stats as faithful as we can. Sorry I didn't check it more closely earlier.
    @jdtournier worth the effort?

  • The removal of Eigen as a host system dependency should be reflected in the documentation for compilation from source.

@Lestropie
Copy link
Member

Another factor that comes to mind here is whether nifti1.h and nifti2.h should be obtained from some other source. If there isn't a robust online source for those files, we could create a new repository under our organisation that just stores those two files so that they can be pulled. This would (I think) prevent the current situation where the thirdparty/ directory contains a combination of content from the main repo and pulled content. It's not a clear beneficial alternative so might not be worth the effort, but thought I'd state nevertheless.

@daljit46
Copy link
Member Author

daljit46 commented Aug 16, 2024

It would have been preferable for df08ea1 to be split into two commits, with removal of the third-party files authored by @MRtrixBot, for the sake of contribution statistics tracking.

I'm not sure what the permissions are on dev, but if we're allowed to rebase then this should be fairly straightforward to do I think. The only file that was deleted was json.h. Better do it ASAP though, before other PRs merge into dev.

we could create a new repository under our organisation that just stores those two files so that they can be pulled.

Makes sense to me.

@daljit46
Copy link
Member Author

daljit46 commented Aug 19, 2024

@Lestropie @jdtournier I have created a new branch fix_pr2859 that splits the commit for removing core/file/json.h and attributes authorship to MRtrixBot. I don't know how we want to proceed, but we could do something like git checkout dev && git reset origin/fix_pr2859 --hard && git push --force. This would rewrite the history up from the last commit before this branch's commits.

@Lestropie
Copy link
Member

Thanks for investing the energy. I did a visual comparison between #2859 and a hypothetical merge of fix_pr2859 to dev. Unfortunately f4b37e0 has the same problem in that it removes half.hpp. If you're able to refactor that one as well, then I can re-check, and then temporarily release the restrictions on dev and do a force push.

@daljit46
Copy link
Member Author

Oh yes I missed that. I rebased again and now put the deletion of core/file/json.h and core/half.hpp into a single commit at the top.

@Lestropie
Copy link
Member

Thanks. Force-pushed and branch protections re-established. I also cherry-picked ee8def6 from #2974 as d591658 before pushing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants