Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move dependencies out of Dockerfile #32

Open
wants to merge 2 commits into
base: devel
Choose a base branch
from

Conversation

jwokaty
Copy link
Contributor

@jwokaty jwokaty commented Jun 14, 2021

This PR creates Ubuntu-files to track docker-specific dependencies and skip dependencies that we don't want installed from BBS.

Regarding libmariadb-dev-compat, I've put it in apt_required.txt but commented it out because it depends on libmariadb-dev and conflicts with libmysqlclient-dev, which gets installed on the build system. We can choose to skip libmysqlclient-dev by putting it in apt_skip.txt and uncomment libmariadb-dev-compat in apt_required.txt so that it gets installed.

bin/install.sh installs BBS dependencies, comparing the BBS Ubuntu-files to this repo's Ubuntu-files.

When I tested, I was able to install the following packages:

 [1] "a4"             "a4Base"         "bioCancer"      "BioMM"         
 [5] "BLMA"           "bnbc"           "canceR"         "ChemmineOB"    
 [9] "cicero"         "CoGAPS"         "ctgGEM"         "CytoTree"      
[13] "edge"           "GeneTonic"      "gpuMagic"       "igvR"          
[17] "methylscaper"   "monocle"        "phemd"          "podkat"        
[21] "projectR"       "RCyjs"          "spatialHeatmap" "tenXplore"     
[25] "tradeSeq"       "Travel"         "uSORT"          "webbioc"    

On 6/29, docker images reported the size as 4.57GB.

I'd appreciate any feedback to improve this PR. You can see my PR for BBS at Bioconductor/BBS#84.

@nturaga nturaga self-requested a review June 30, 2021 18:59
@nturaga nturaga self-assigned this Jun 30, 2021
@nturaga
Copy link
Collaborator

nturaga commented Jul 1, 2021

Thanks for the PR @jwokaty.

The overall size of the docker image currently is much larger by about 750MB (approx)

bioconductor/bioconductor_docker                              jw-update              337ddea798d9   45 hours ago   4.65GB
bioconductor/bioconductor_docker                              devel                  1458fe590fe7   46 hours ago   3.93GB

The key questions for this image are:

  1. Does this PR make the bioconductor/bioconductor_docker:devel image the "same" as the BBS linux machine? (was that the goal here? if so, we could make a new image bioconductor_docker:linux_builder --> We still need to test if it's the same as the BBS machine though.)

  2. What are the "extra" 750MB worth of system dependencies?

  3. One thing for me that makes this PR a little complicated to read is that the packages that are being installed aren't "explicit" anymore. They are lost in the apt-*.txt files and then, within the awk commands.

I'm happy to help on any of these, and welcome thoughts from @jwokaty, @hpages, @vjcitn and @mtmorgan .

@nturaga nturaga added the enhancement New feature or request label Jul 1, 2021
bin/install.sh Outdated
# Constructing array of apt packages, removing unnecessary packages.
cat $bbs_files/apt_*.txt | awk '/^[^#]/ {print $1}' | sort >> /tmp/bbs_apt_pkgs
cat $bd_files/apt_*.txt | awk '/^[^#]/ {print $1}' | sort >> /tmp/skip_apt_pkgs
apt_pkgs=$(comm -23 /tmp/bbs_apt_pkgs /tmp/skip_apt_pkgs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be saved into a file, instead of just being a variable. This is so that we can track what is being installed explicitly. Both apt-pkgs and apt_required_pkgs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the list of packages that's now in apt_pkgs be a file in the repository?

bin/install.sh Outdated

# Remove files
if test -n "$(find /tmp -maxdepth 1 -name '*_pkgs' -print -quit)"; then
rm /tmp/*_pkgs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all other files other than what is being installed.

bin/install.sh Outdated
# Constructing array of pip packages, removing unnecessary packages.
cat $bbs_files/pip_*.txt | awk '/^[^#]/ {print $1}' | sort >> /tmp/bbs_pip_pkgs
cat $bd_files/pip_*.txt | awk '/^[^#]/ {print $1}' | sort >> /tmp/skip_pip_pkgs
pip_pkgs=$(comm -23 /tmp/bbs_pip_pkgs /tmp/skip_pip_pkgs)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a couple of more comments, so we are aware of what is going on explicitly here.

bin/install.sh Outdated

# Get repository with Ubuntu-files
if [ ! -d "/tmp/BBS" ]; then
git clone $repo /tmp/BBS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shallow clone --depth 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this tip!

bin/install.sh Outdated
# Install pip packages
pip3 install $pip_pkgs

# Remove files
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up repos along the lines of

	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/*

do it in this layer, otherwise they are only 'masked' rather than removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this a little more or point me toward more information?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layers are independent, so if in one layer you create, maybe indirectly, a large file RUN <create large-file> and in the next layer you try to clean that up RUN rm <large-file> actually the large file still exists, just the file system doesn't know about it. Maybe the comment isn't directly relevant, but hopefully the sentiment is -- clean up before the layer closes.

libeigen3-dev
libfribidi-dev
libfuse-dev
libgmp-dev # BD uses libgmp3-dev
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BD == bioconductor_docker? Would be good to spell this out at least on first use

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, bd is bioconductor_docker.

bin/install.sh Outdated
repo="https://github.com/Bioconductor/bbs"
branch="master"
bbs_files="/tmp/BBS/Ubuntu-files/$version"
bd_files="/tmp/$version"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bd_files is cryptic to me here, too.

@hpages
Copy link

hpages commented Jul 1, 2021

I took a look at Dockerfile, went thru the list of deb packages that are explicitly listed in the file, and annotated them. This should help us decide what to do with each of them. The goal is that each deb package should go in one of the following lists:

  1. apt_required_build.txt
  2. apt_required_compile_R.txt
  3. apt_optional_compile_R.txt
  4. apt_extra_fonts.txt
  5. apt_cran.txt
  6. apt_bioc.txt
  7. apt_nice_to_have.txt
  8. apt_docker_only.txt

All these lists (except the last one) are in https://github.com/Bioconductor/BBS/tree/master/Ubuntu-files/20.04/. The last one (apt_docker_only.txt) would need to be created. It would list stuff that is maybe nice to have on the Docker image for developers but are is not strictly required to install/run Bioconductor. Your input will be valuable @nturaga to decide whether or not you want to keep these things on the Docker image.

Here's the annotated list extracted from Dockerfile:

	## Basic deps
	gdb \                          add to apt_nice_to_have.txt
	libxml2-dev \                  already in apt_cran.txt
	python3-pip \                  already in apt_required_build.txt
	libz-dev \                     who needs that? maybe create a new list
                                       (e.g. apt_docker_only.txt) and add to it
	liblzma-dev \                  already in apt_required_compile_R.txt
	libbz2-dev \                   already in apt_required_compile_R.txt
	libpng-dev \                   already in apt_optional_compile_R.txt
	libgit2-dev \                  already in apt_cran.txt
	## sys deps from bioc_full
	pkg-config \                   add to apt_nice_to_have.txt
	fortran77-compiler \           we use gfortran (in apt_required_compile_R.txt) on the build
                                       machines
	byacc \                        who needs that? maybe add to apt_docker_only.txt
	automake \                     already in apt_bioc.txt
	curl \                         we use libcurl4-openssl-dev on the build
                                       machines (needed for CRAN packages RCurl
                                       and curl)
	## This section installs libraries
	libpcre2-dev \                 already in apt_required_compile_R.txt
	libnetcdf-dev \                already in apt_bioc.txt
	libhdf5-serial-dev \           who needs that? maybe add to apt_docker_only.txt
	libfftw3-dev \                 already in apt_cran.txt
	libopenbabel-dev \             already in apt_bioc.txt
	libopenmpi-dev \               we use mpi-default-dev (apt_cran.txt) on the build machines
	libxt-dev \                    already in apt_required_compile_R.txt
	libudunits2-dev \              already in apt_cran.txt
	libgeos-dev \                  already in apt_cran.txt
	libproj-dev \                  already in apt_cran.txt
	libcairo2-dev \                already in apt_optional_compile_R.txt
	libtiff5-dev \                 we use libtiff-dev (in apt_optional_compile_R.txt) on the
                                       build machines
	libreadline-dev \              already in apt_required_compile_R.txt
	libgsl0-dev \                  we use libgsl-dev (in apt_bioc.txt) on the build machines
	libgslcblas0 \                 who needs that? maybe add to apt_docker_only.txt
	libgtk2.0-dev \                already in apt_cran.txt
	libgl1-mesa-dev \              gets automatically installed by libglu1-mesa-dev so maybe no
                                       need for an explicit install
	libglu1-mesa-dev \             already in apt_cran.txt
	libgmp3-dev \                  we use libgmp-dev (in apt_cran.txt) on the build machines
	libhdf5-dev \                  who needs that? maybe add to apt_docker_only.txt
	libncurses-dev \               gets automatically installed by libreadline-dev but maybe
                                       add it to apt_required_compile_R.txt anyway just in case
	libbz2-dev \                   already in apt_required_compile_R.txt
	libxpm-dev \                   who needs that? maybe add to apt_docker_only.txt
	liblapack-dev \                we don't use the system LAPACK library on the build machines
	libv8-dev \                    already in apt_cran.txt
	libgtkmm-2.4-dev \             already in apt_bioc.txt
	libmpfr-dev \                  already in apt_cran.txt
	libmodule-build-perl \         who needs that? maybe add to apt_docker_only.txt
	libapparmor-dev \              who needs that? maybe add to apt_docker_only.txt
	libprotoc-dev \                who needs that? maybe add to apt_docker_only.txt
	librdf0-dev \                  who needs that? maybe add to apt_docker_only.txt
	libmagick++-dev \              already in apt_cran.txt
	libsasl2-dev \                 already in apt_cran.txt
	libpoppler-cpp-dev \           already in apt_cran.txt
	libprotobuf-dev \              already in apt_cran.txt
	libpq-dev \                    already in apt_cran.txt
	libperl-dev \                  already in apt_cran.txt
	## software - perl extentions and modules
	libarchive-extract-perl \      who needs that? maybe add to apt_docker_only.txt
	libfile-copy-recursive-perl \  who needs that? maybe add to apt_docker_only.txt
	libcgi-pm-perl \               who needs that? maybe add to apt_docker_only.txt
	libdbi-perl \                  who needs that? maybe add to apt_docker_only.txt
	libdbd-mysql-perl \            who needs that? maybe add to apt_docker_only.txt
	libxml-simple-perl \           who needs that? maybe add to apt_docker_only.txt
	libmysqlclient-dev \           already in apt_cran.txt
	default-libmysqlclient-dev \   not needed (redundant with libmysqlclient-dev)
	libgdal-dev \                  already in apt_cran.txt
	## new libs
        libglpk-dev \                  already in apt_cran.txt
        libeigen3-dev \                already in apt_bioc.txt
	## Databases and other software
	sqlite \                       not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	openmpi-bin \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	mpi-default-bin \              only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	openmpi-common \               only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	openmpi-doc \                  only mpi-default-dev (apt_cran.txt) is strictly needed to
                                       install/run Bioconductor so maybe add to apt_docker_only.txt
	tcl8.6-dev \                   we use tcl-dev (apt_optional_compile_R.txt) on the build machines
	tk-dev \                       already in apt_optional_compile_R.txt
	default-jdk \                  already in apt_optional_compile_R.txt
	imagemagick \                  not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	tabix \                        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt
	ggobi \                        who needs that? maybe add to apt_docker_only.txt
	graphviz \                     already in apt_bioc.txt
	protobuf-compiler \            already in apt_cran.txt
	jags \                         already in apt_cran.txt
	## Additional resources
	xfonts-100dpi \                already in apt_extra_fonts.txt
	xfonts-75dpi \                 already in apt_extra_fonts.txt
	biber \                        AFAIK this is only needed to build some vignettes so we have
                                       it listed in apt_vignettes_reference_manuals.txt and that's
                                       a list that we do not want to install on the Docker image
        libsbml5-dev \                 already in apt_bioc.txt
        libzmq3-dev \                  who needs that? maybe add to apt_docker_only.txt

## FIXME
## These two libraries don't install in the above section--WHY?
RUN apt-get update \
	&& apt-get -y --no-install-recommends install \
	libmariadb-dev-compat \        not needed to install/run Bioconductor so maybe add to
                                       apt_docker_only.txt if you really want this on the Docker
                                       image
	libjpeg-dev \                  already in apt_optional_compile_R.txt
	libjpeg-turbo8-dev \           installing libjpeg-dev should be enough
	libjpeg8-dev \                 installing libjpeg-dev should be enough

Note that I've left the following section from Dockerfile out of the discussion for now:

## Python installations
RUN apt-get update \
	&& apt-get install -y software-properties-common \
	&& add-apt-repository universe \
	&& apt-get update \
	&& apt-get -y --no-install-recommends install python2 python-dev \
	&& curl https://bootstrap.pypa.io/pip/2.7/get-pip.py --output get-pip.py \
	&& python2 get-pip.py \
	&& pip2 install wheel \
	## Install sklearn and pandas on python
	&& pip2 install sklearn \
	pandas \
	pyyaml \
	&& apt-get clean \
	&& rm -rf /var/lib/apt/lists/* \
	&& rm -rf get-pip.py

because I'm not sure what to do with it or why it is needed. We do need some Python modules on the build machines but they should all be installed for Python 3, not Python 2 (we've dropped support for Python 2 last year).

The goal is that in the future we'll only need to add new deb packages to the apt_cran.txt and/or apt_bioc.txt lists as new (or existing) Bioconductor packages introduce new system requirements. This will impact what gets installed on both, the build machines and the Docker image.

Hope this helps,
H.

@jwokaty
Copy link
Contributor Author

jwokaty commented Jul 8, 2021

So in docker, we should be installing apt dependencies from the following files:

  • apt_bioc.txt
  • apt_cran.txt
  • apt_optional_compile_R
  • apt_required_compile_R
  • apt_extra_fonts
  • apt_required_compile_R
  • apt_required_build_R

We're not installing apt_nice_to_have and apt_vignettes_reference_manuals--is that correct?

I also want to clarify that when one of these files has packages not listed in the current Dockerfile in the master branch, that we still install everything in the file. For example, the dockerfile on the master branch lists only 2 font packages; however, apt_extra_fonts has 8 total packages. So we will still be installing more packages than the current docker on the master branch, but at least what we're installing is explicit.

Additionally, when the docker and build systems have a similar package, we should choose the build system package that's in one of the files listed, correct?

@nturaga
Copy link
Collaborator

nturaga commented Jul 8, 2021 via email

@jwokaty
Copy link
Contributor Author

jwokaty commented Jul 8, 2021

@nturaga Thanks for trying to answer some of my questions as well as pointing me to the rocker scripts.

I'm not sure if there's a better way to investigate these dependencies, but I decided to use code.bioconductor.org to investigate the "who needs that" packages. I will look there too for these files, but they're probably dependencies from nonbioconductor packages.

If these are for building vignettes and we don't build vignettes with docker, why are we installing them? The fonts are just one group of files where I know there are more in the BBS than in docker. I suspect there will be others.

@jwokaty
Copy link
Contributor Author

jwokaty commented Jul 13, 2021

I marked all the packages that are in both docker and the BBS:

apt_bioc.txt:graphviz                    # for Rgraphviz             # Bioconductor Docker
apt_bioc.txt:libgtkmm-2.4-dev            # for HilbertVisGUI         # Bioconductor Docker
apt_bioc.txt:libgsl-dev                  # for GSL                   # Bioconductor Docker
apt_bioc.txt:libsbml5-dev                # for rsbml                 # Bioconductor Docker
apt_bioc.txt:automake                    # for RProtoBufLib          # Bioconductor Docker
apt_bioc.txt:libnetcdf-dev               # for mzR, RNetCDF          # Bioconductor Docker
apt_bioc.txt:libopenbabel-dev            # for ChemmineOB            # Bioconductor Docker
apt_bioc.txt:libeigen3-dev               # for ChemmineOB            # Bioconductor Docker
apt_cran.txt:libglu1-mesa-dev        # for rgl                       # Bioconductor Docker
apt_cran.txt:libgmp-dev              # for gmp                       # Bioconductor Docker
apt_cran.txt:libsasl2-dev            # for mongolite                 # Bioconductor Docker
apt_cran.txt:libxml2-dev             # for XML                       # Bioconductor Docker
apt_cran.txt:libcurl4-openssl-dev    # for RCurl, curl               # Bioconductor Docker
apt_cran.txt:mpi-default-dev         # for Rmpi                      # Bioconductor Docker
apt_cran.txt:libudunits2-dev         # for units                     # Bioconductor Docker
apt_cran.txt:libv8-dev               # for V8                        # Bioconductor Docker
apt_cran.txt:libmpfr-dev             # for Rmpfr                     # Bioconductor Docker
apt_cran.txt:libfftw3-dev            # for fftw, fftwtools           # Bioconductor Docker
apt_cran.txt:libmysqlclient-dev      # for RMySQL                    # Bioconductor Docker
apt_cran.txt:libpq-dev               # for RPostgreSQL, RPostgres    # Bioconductor Docker
apt_cran.txt:libmagick++-dev         # for magick                    # Bioconductor Docker
apt_cran.txt:libgeos-dev             # for rgeos                     # Bioconductor Docker
apt_cran.txt:libproj-dev             # for proj4                     # Bioconductor Docker
apt_cran.txt:libgdal-dev             # for sf                        # Bioconductor Docker
apt_cran.txt:libpoppler-cpp-dev      # for pdftools                  # Bioconductor Docker
apt_cran.txt:libgtk2.0-dev           # for RGtk2                     # Bioconductor Docker
apt_cran.txt:libgit2-dev             # for gert                      # Bioconductor Docker
apt_cran.txt:jags                    # for rjags                     # Bioconductor Docker
apt_cran.txt:libprotobuf-dev         # for protolite                 # Bioconductor Docker 
apt_cran.txt:protobuf-compiler       # for protolite                 # Bioconductor Docker
apt_cran.txt:libglpk-dev             # for glpkAPI and to compile igraph with GLPK support   # Bioconductor Docker
apt_extra_fonts.txt:xfonts-100dpi                                    # Bioconductor Docker
apt_extra_fonts.txt:xfonts-75dpi                                     # Bioconductor Docker
apt_nice_to_have.txt:gdb                                             # Bioconductor Docker        (suggested add from above)
apt_nice_to_have.txt:pkg-config                                      # Bioconductor Docker       (suggested add from above)
apt_optional_compile_R.txt:libpng-dev                                # Bioconductor Docker
apt_optional_compile_R.txt:libjpeg-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libtiff-dev                               # Bioconductor Docker
apt_optional_compile_R.txt:libcairo2-dev                             # Bioconductor Docker
apt_optional_compile_R.txt:tcl-dev                                   # Bioconductor Docker
apt_optional_compile_R.txt:tk-dev                                    # Bioconductor Docker
apt_optional_compile_R.txt:default-jdk                               # Bioconductor Docker
apt_required_build.txt:python3-pip                                   # Bioconductor Docker
apt_required_compile_R.txt:gfortran                                  # Bioconductor Docker
apt_required_compile_R.txt:libreadline-dev                           # Bioconductor Docker
apt_required_compile_R.txt:libxt-dev                                 # Bioconductor Docker
apt_required_compile_R.txt:libbz2-dev                                # Bioconductor Docker
apt_required_compile_R.txt:liblzma-dev                               # Bioconductor Docker
apt_required_compile_R.txt:libpcre2-dev                              # Bioconductor Docker
apt_required_compile_R.txt:libcurl4-openssl-dev                      # Bioconductor Docker
apt_required_compile_R.txt:libncurses-dev                            # Bioconductor Docker         (suggested add from above)

Here's what's only in the BBS:

apt_bioc.txt:firefox                     # for packages using utils::browseURL()
apt_bioc.txt:libgraphviz-dev             # for Rgraphviz
apt_bioc.txt:clustalo                    # for LowMACA
apt_bioc.txt:ocl-icd-opencl-dev          # for gpuMagic
apt_bioc.txt:libavfilter-dev             # for av/spacialHeatmap
apt_bioc.txt:libfribidi-dev              # for EnhancedVolcano
apt_bioc.txt:infernal                    # for inferrnal
apt_bioc.txt:fuse                        # for Travel
apt_bioc.txt:libfuse-dev                 # for Travel
apt_bioc.txt:kallisto                    # for rkal
apt_bioc.txt:mono-runtime                # for rawr
apt_bioc.txt:libmono-system-data4.0-cil  # for rawr
apt_cran.txt:librsvg2-dev            # for rsvg
apt_cran.txt:libssl-dev              # for openssl, mongolite
apt_extra_fonts.txt:# APT packages for extra fonts
apt_extra_fonts.txt:gsfonts-x11
apt_extra_fonts.txt:xfonts-base
apt_extra_fonts.txt:xfonts-scalable
apt_extra_fonts.txt:t1-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree
apt_extra_fonts.txt:ttf-xfree86-nonfree-syriac
apt_nice_to_have.txt:tree
apt_nice_to_have.txt:manpages-dev    # man pages for C standard library
apt_nice_to_have.txt:mlocate         # Provides the locate command
apt_optional_compile_R.txt:gobjc
apt_optional_compile_R.txt:libicu-dev
apt_required_build.txt:python3-minimal
apt_required_build.txt:git
apt_required_compile_R.txt:build-essential
apt_required_compile_R.txt:libx11-dev
apt_required_compile_R.txt:zlib1g-dev
apt_vignettes_reference_manuals.txt:texlive
apt_vignettes_reference_manuals.txt:texlive-font-utils          # for epstopdf
apt_vignettes_reference_manuals.txt:texlive-pstricks            # provides pstricks.sty
apt_vignettes_reference_manuals.txt:texlive-latex-extra         # provides fullpage.sty
apt_vignettes_reference_manuals.txt:texlive-fonts-extra         # provides incosolata.sty
apt_vignettes_reference_manuals.txt:texlive-bibtex-extra        # provides unsrturl.bst
apt_vignettes_reference_manuals.txt:texlive-science             # provides algorithm.sty
apt_vignettes_reference_manuals.txt:texlive-luatex              # provides luatex85.sty
apt_vignettes_reference_manuals.txt:texlive-lang-european       # provides language definition files e.g. swedish.ldf
apt_vignettes_reference_manuals.txt:texi2html
apt_vignettes_reference_manuals.txt:texinfo
apt_vignettes_reference_manuals.txt:pandoc                      # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:pandoc-citeproc             # needed for CRAN package knitr
apt_vignettes_reference_manuals.txt:biber
apt_vignettes_reference_manuals.txt:#ttf-mscorefonts-installer

So while it's fine that we don't include the vignette packages, we see there's still other packages in other files that we do want to include that have additional packages. To complicate matters more, we have dependencies installed from rocker that we don't need to install again because we don't replace the original package, they just add another layer (note: we're usually installing a -dev version):

gfortran
libbz2-*
libcurl4
libicu*
libpcre2*
libjpeg-turbo*
libreadline
libtiff*
liblzma*
zlib1g

It seems that if we want to install apt_bioc.txt, apt_cran.txt, apt_optional_compile_R, apt_required_compile_R, apt_extra_fonts (do we still need this for the docker?), apt_required_compile_R, and apt_required_build_R, we should expect that the docker is going to be larger because of the extra packages. I think the current method where I exclude packages will give better control; it's just that what's installed needs to become explicit. I also think we should keep the practice of annotating any package dependencies.

I was not able to find any reference for the following packages listed in the Dockerfile when searching code.bioconductor.org, with the exception of the first package. These are the candidates for the apt_docker_only.txt suggested by @hpages . But I think we should remove them if not needed. We could do a test comparing what can be installed with the current docker image and an image without the packages below.

libz-dev                            # ceTF, proBatch
byacc
libhdf5-serial-dev
libgslcblas0
libhdf5-dev
libxpm-dev
libmodule-build-perl
libprotoc-dev
librdf0-dev
libarchive-extract-perl
libfile-copy-recursive-perl
libcgi-pm-perl
libdbi-perl
libdbd-mysql-perl
libxml-simple-perl
sqlite                          # Not needed
openmpi-bin
mpi-default-bin
openmpi-common
openmpi-doc
tabix
imagemagick
ggobi
libzmq3-dev

@jwokaty
Copy link
Contributor Author

jwokaty commented Jul 28, 2021

# Current devel as of July 23?
 [1] "affyPara"       "canceR"         "CelliD"         "cellity"       
 [5] "CompGO"         "ctgGEM"         "CytoTree"       "fgga"          
 [9] "GateFinder"     "gCrisprTools"   "gpuMagic"       "immunotation"  
[13] "lisaClust"      "methyAnalysis"  "phemd"          "rawrr"         
[17] "SCATE"          "scClassifR"     "schex"          "scTensor"      
[21] "scTGIF"         "SeqSQC"         "spatialHeatmap" "spicyR"        
[25] "SwimR"          "Travel"         "vissE"          "waddR"

# Building without some packages                     
 [1] "CAMERA"         "canceR"         "cellity"        "cicero"        
 [5] "cliqueMS"       "cosmiq"         "ctgGEM"         "CytoTree"      
 [9] "flagme"         "GateFinder"     "gpuMagic"       "igvR"          
[13] "immunotation"   "IPO"            "LOBSTAHS"       "MAIT"          
[17] "meshes"         "meshr"  %        "Metab"          "metaMS"        
[21] "methylscaper"   "monocle"        "ncGTW"          "phemd"         
[25] "proFIA"         "rawrr"          "RCyjs"          "Risa"          
[29] "scTensor"       "SeqSQC"         "spatialHeatmap" "tenXplore"     
[33] "Travel"         "uSORT"          "xcms"          

Not sure why these are different as this is not what I expected!

The following packages were selected because either we weren't sure
what bioc packages required them or they appeared to be already satisfied
by something in rocker. After building with these commented out, I manually
reinstalled them one by one to see if they allowed additional bioc packages
to be installed. Only libzmq3-dev allowed RCy3 to be installed.

Appeared to be already installed

byacc
fortran77-compiler
imagemagick
libgmp3-dev
libtiff5-dev
sqlite
ggobi
libarchive-extract-perl
libcgi-pm-perl
libdbd-mysql-perl
libfile-copy-recursive-perl
libgsl0-dev
libhdf5-serial-dev
liblapack-dev
libmariadb-dev-compat
libmodule-build-perl
librdf0-dev
libxml-simple-perl
libxpm-dev
mpi-default-bin
openmpi-doc
tabix

Not needed / No additional Bioc packages were install after the following apt packages were installed

default-libmysqlclient-dev
libgl1-mesa-dev
libdbi-perl
libprotoc-dev
libz-dev

- Add packages already installed via Rocker to skip file

- Move scripts to src

- Add comments to src/install.sh
@jwokaty
Copy link
Contributor Author

jwokaty commented Sep 8, 2021

I've tried to address all previous comments. However, I'm not attempting to recreate what is in the master branch nor the BBS, but a container that installs packages from the BBS and can override entries that we don't want to install (for example, when they've already been installed via Rocker).

The current size is 4.25GB. All but the following Bioconductor packages are installed:

 [1] "ArrayExpressHTS" "brainflowprobes" "BridgeDbR"       "canceR" 
 [5] "CHRONOS"         "cn.mops"         "CNVfilteR"       "CNViz"     
 [9] "CopyNumberPlots" "CytoTree"        "DaMiRseq"        "debCAM"
 [13] "DeepPINCS"       "derfinder"       "derfinderPlot"   "esATAC"   
 [17] "gaggle"          "GARS"            "IsoGeneGUI"      "miRSM" 
 [21] "MSGFgui"         "MSGFplus"        "panelcn.mops"    "paxtoolsr"
 [25] "phemd"           "psichomics"      "Rcpi"            "recount"
 [29] "regionReport"    "ReQON"           "RGMQL"           "RMassBank"  
 [33] "rmelting"        "RmiR"            "RNAAgeCalc"      "sarks"
 [37] "SELEX"           "SICtools"        "VAExprs"

Here's what we actually install in the Docker. You see these in the output when container is built. You can comment out the clean up at the bottom of src/install.sh and view the install_apt_pkgs and install_pip_pkgs files to see these packages.

# APT
automake
clustalo
firefox
fuse
graphviz
infernal
jags
kallisto
libavfilter-dev
libcurl4-openssl-dev
libeigen3-dev
libfftw3-dev
libfribidi-dev
libfuse-dev
libgdal-dev
libgeos-dev
libgit2-dev
libglpk-dev
libglu1-mesa-dev
libgmp-dev
libgraphviz-dev
libgsl-dev
libgtk2.0-dev
libgtkmm-2.4-dev
libmagick++-dev
libmono-system-data4.0-cil
libmpfr-dev
libmysqlclient-dev
libnetcdf-dev
libopenbabel-dev
libpoppler-cpp-dev
libpq-dev
libproj-dev
libprotobuf-dev
librsvg2-dev
libsasl2-dev
libsbml5-dev
libssl-dev
libudunits2-dev
libv8-dev
libxml2-dev
mono-runtime
mpi-default-dev
ocl-icd-opencl-dev
protobuf-compiler
python3-pip
tcl-dev
tk-dev
curl
libzmq3-dev
python3-pip

# PIP
h5py
h5pyd
jupyter
matplotlib
mofapy
mofapy2
nbconvert
numpy
phate
scipy
tensorflow_probability
testresources
virtualenv

If this is still too big, I need to know where to cut. I could start removing dependencies required for only a few BioC packages.

If we want to be more explicit, I can write a script to generate the above packages and we can rerun the script every time we want to update the docker image. We can also commit the list of packages.

I still kept a list of packages to skip because the BBS files have quite a few packages that were already installed via Rocker.

@jwokaty jwokaty requested a review from nturaga September 8, 2021 18:09
@nturaga
Copy link
Collaborator

nturaga commented Sep 8, 2021

Thanks @jwokaty, I will review this today/tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants