Skip to content
This repository has been archived by the owner on Dec 14, 2020. It is now read-only.

Latest commit

 

History

History
439 lines (299 loc) · 7.5 KB

File metadata and controls

439 lines (299 loc) · 7.5 KB

import { CodeSurfer, CodeSurferColumns, Step, } from "code-surfer"; import { Appear, Notes, Invert, Split } from "mdx-deck"; import { github, vsDark } from "@code-surfer/themes"; import { Logo } from "./components"; import "prismjs/components/prism-docker"; import "prismjs/components/prism-shell-session";

export const theme = vsDark;

Enhancing Reproducibility

With Containers


👋 I'm Bradford Roarr

  • Software developer with CIS here at Brown
  • I work with cloud technologies and containerization

:female-technologist: I'm a developer

  • Worked in private, government and now higher-ed
  • Worked for small and large companies
  • Most recently worked for the company that made the sensors in the mail room

:female-scientist: Not a scientist

  • Here to provide a developers perspective on reproducibility

Reproducibility is important!

  • For all of us, not just scientists
  • Yes, software reproducibility enables better science
  • Also reduces the set of bugs that can occur in production software

This talk is not about data

Data is its own special thing

  • I am not qualified to talk about data
  • I'm going to focus on software reproducibility

Ingredients of Reproducibility

  • Source code

  • Third-Party libraries

  • Runtime environment

  • Need each ingredient to get good reproducibility

Source code

  • Captured in version control

  • Releases tagged

  • Immutable tags


Third-Party libraries

  • Captured in a package manifest file

  • Libraries pegged to specific version

  • Peg libraries to one version, not a range
  • Select libraries that are maintained and have a corporate sponsor

Now what?


Runtime environment

  • Capture system software requirements

  • Freeze runtime environment

  • Package runtime environment for consumption


How?

Containers!


Containers

  • Capture system software requirements

  • Freeze runtime environment

  • Package runtime environment for consumption

  • How many have hear of containers?
  • How many use containers?
  • Great abstraction for capturing environment
  • Still not perfect

Container Runtimes

Docker

Singularity

  • Two main contenders at Brown
  • Singularity more used for research
  • Docker used for application development

Container Runtimes

Docker

  • Going to use Docker today
  • I'm more familiar with Docker
  • Both are OCI compliant
  • OCI = Open Container Initiative

--- Dockerfile  2019-10-25 14:21:55.000000000 -0400
+++ Dockerfile  2019-10-25 14:41:29.000000000 -0400
@@ -1,4 +1,4 @@
-FROM python
+FROM python:3.8

 RUN apt-get update && apt-get install -y \
     libxml2-dev \
--- Dockerfile  2019-10-25 14:21:55.000000000 -0400
+++ Dockerfile  2019-10-25 14:41:29.000000000 -0400
@@ -1,4 +1,4 @@
-FROM python:3.8
+FROM python:3.8.0-buster

 RUN apt-get update && apt-get install -y \
     libxml2-dev \
--- Dockerfile  2019-10-25 14:21:55.000000000 -0400
+++ Dockerfile  2019-10-25 14:52:54.000000000 -0400
@@ -1,8 +1,8 @@
 FROM python:3.8.0-buster

 RUN apt-get update && apt-get install -y \
-    libxml2-dev \
-    libsqlite3-dev
+    libxml2-dev=2.9.4+dfsg1-7+b3 \
+    libsqlite3-dev=3.27.2-3

 COPY requirements.txt .
 RUN pip install -r requirements.txt
--- Dockerfile  2019-10-25 22:33:13.000000000 -0400
+++ Dockerfile.2        2019-10-28 07:56:14.000000000 -0400
@@ -15,5 +15,6 @@
 WORKDIR /app
 COPY . ./

+ENV RANDOM_SEED=1024
 ENTRYPOINT ["python"]
-CMD ["my-script.py"]
+CMD ["my-script.py", "special", "arguments", "here"]
  • Contrived dockerfile
  • Builds our image, but doesn't run
  • Few common gotchas when making new images

Dockerfile Gotchas

  • Always specify full version for images

  • Always specify versions of system software

  • Understand the OS your image uses

  • Capture arguments and env variables

  • Different operating systems have different packages
  • Alpine linux uses a different libc (glibc vs musl)

Building containers

$ docker build .
$ docker build -t science .
$ docker build -t science:1.0.7 .
  • This bit is Docker specific
  • Singularity container names based on file paths/names
  • Image is built and tagged with a well specified runtime
  • Time to gather results and do some science

Distributing Containers

  • Distribute exact copy of runtime

  • Sign to prove ownership

  • Publish public key used to sign container

  • Papers written, science complete
  • Distribute container with source to reproduce experiments
  • Docker can only be distributed through hubs
  • Singularity generates a file that can be shared

Docker Content Trust

  • Everything is handled by Docker

  • Deisgned for security not just verification

  • Controlled by Notary system

  • Much more complicated than Singularity

  • DCT initialized on registry and repository
  • Admin generates delegate keys for user
  • User pushes image with DTC enabled
  • Signing and verification automatic

Singularity

  • User generates keys

  • User signs image

  • Peer verifies image signature

  • Two commands
  • Sign
  • Verify

Containers aren't perfect

  • Rely on host system kernel

  • Signatures don't capture runtime changes

  • Hardware differences matter

  • Document kernel differences
  • Document hardware used to produce results
  • Don't modify code at runtime
  • Don't dynamically download code or libraries at runtime

Questions?

[email protected]