Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates bcftools to latest version #154

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

MattWellie
Copy link
Contributor

@MattWellie MattWellie commented Jul 1, 2024

  • Update BCFtools to use 1.20
  • https://github.com/samtools/bcftools/releases
  • Would like to adopt the --write-index functionality (introduced in 1.18), so we don't have to follow bcftools commands with a subsequent tabix command

The released log looks additive (i.e. I would not expect any of our current workflows to be broken by this change, as we are using a limited number of features in 1.16 (August 2022), none of which have been removed, a couple have additional CLI flags available, but no backwards-incompatible changes)

Also ditches the miniconda build, just doesn't seem necessary to delegate this installation to another tool, which needs to be installed itself

A specific job failing due to use of commands which are not in our containerised version of bcftools: https://batch.hail.populationgenomics.org.au/batches/457521/jobs/29

Edit: added a few more names to this - I don't anticipate any breaking changes, but this tool is very generic, so an update could affect all our workflows

Here's a run using this build (built on this Dockerfile, version 1.20, in images-dev): https://batch.hail.populationgenomics.org.au/batches/457715

@@ -1,14 +1,26 @@
FROM debian:buster-slim
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bookworm o'clock.

git clone https://github.com/samtools/bcftools.git --branch ${VERSION} --depth 1 && \
cd bcftools && \
make && \
mv bcftools /usr/local/bin && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use make install rather than manually copying a small subset — in particular, you're missing bcftools's plugins.

@jmarshall
Copy link
Contributor

If you're going to build bcftools from source, which I'd generally be in favour of, I'd strongly recommend building from a release tarball rather than a git checkout, and running configure.

Bioconda's build enables a few features (the polysomy command, which needs GSL; filtering via Perl syntax) that you'd need to install more prerequisites to duplicate, but we probably don't use those.

It might be worth doing this via a multi-stage docker build so that the compilers and development packages aren't in the eventual bcftools image.

@hopedisastro
Copy link
Contributor

the --write-index is a major plus! I use core functions inbcftools in several parts of my pipeline, but I think upgrading is fine (and can always use the older version by specifying the flag if needed).

@MattWellie
Copy link
Contributor Author

It might be worth doing this via a multi-stage docker build so that the compilers and development packages aren't in the eventual bcftools image.

Pushed a 3-stage version:

  • install required packages for installation and running as a base
  • required for compiling only, copy and build release
  • also from base, copy compiled software

* Only install the runtime libraries (not dev packages) into the base stage
* Hence explicitly install the dev packages into the compiler stage
* Configure explicitly with --enable-libcurl etc so configuration fails
  if the prerequisites are missing
* Strip the resulting executables to save a bit of space
* Use DESTDIR instead of --prefix so the executable doesn't have the
  build prefix burnt into it (in particular, so it looks for plugins
  in the eventual right place)
* Copy both executables (bin) and plugins (libexec) to the final stage
* Set DEBIAN_FRONTEND to stop apt-get from trying to be interactive
@jmarshall
Copy link
Contributor

I've taken the liberty of pushing to your branch to add bcftools's plugin commands to the image and remove the unneeded development packages from the final image.

Now that it's using a multi-stage build, the compiler stage does need to be so optimised so could use several RUN commands instead of … && … as we don't care how many layers it has. That would allow more caching during the development cycle, which would make the cycle a bit faster… 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants