Reprozip "zips up" all the file required by your experiment into one package so that other researchers can recreate your experiment. So what is an experiment?
An experiment is a program like a compiled python, R or matlab file or a script. Reprozip tracks what the program does, what system functions it calls and what files and libraries it uses! This way, reprozip knows exactly what your experiment or program did. So it packages all these system utilies, libraries and programs into one little Reprozip container.
You can now unpackage the Reprozip container into a Docker container or a Vagrant virtual machine. Then you simply rerun the program/experiment!
The difference between this and Vistrails is that Vistrails produces a reproducible workflow to create a visualization, but Reprozip replicates all the system components that were needed in your experiment!
sudo pip install reprozip
Then I ran a simple command that pings google's server
ping google.com
Then reprozip created the following configuration file.
# Run info
version: "0.8"
runs:
# Run 0
- architecture: x86_64
argv: [ping, google.com]
binary: /usr/bin/ping
distribution: ['', '']
environ: {ANT_HOME: /usr/share/apache-ant, COLORFGBG: 15;default, COLORTERM: rxvt,
DBUS_SESSION_BUS_ADDRESS: 'unix:path=/run/user/1000/bus', DESKTOP_SESSION: i3,
DESKTOP_STARTUP_ID: i3/urxvt/1155-126-arjun-thinkpad_TIME101258575, DISPLAY: ':0',
EDITOR: vim, FREETYPE_PROPERTIES: 'truetype:interpreter-version=40', GDMSESSION: i3,
GDM_LANG: en_US.utf8, GOOGLE_DRIVE_SETTINGS: /home/arjun/.duplicity/credentials,
GOPATH: /usr/local/share/gopath/, GPGKEY: C92771D0, GPGKEYNOPASSPHRASE: 01A97666,
GTK_IM_MODULE: ibus, GTK_MODULES: canberra-gtk-module, HOME: /home/arjun/, LANG: en_US.utf8,
LOGNAME: arjun, LS_COLORS: 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;47:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:',
MAIL: /var/spool/mail/arjun, MOZ_PLUGIN_PATH: /usr/lib/mozilla/plugins, PATH: '/home/arjun//bin/misc_scripts:/home/arjun//bin:/home/arjun//bin/misc_scripts:/home/arjun//bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/home/arjun/.gem/ruby/2.3.0/bin',
PWD: /home/arjun/Core/Code/reproducible_science, PYTHONSTARTUP: /home/arjun//.pythonrc,
QT_IM_MODULE: ibus, R_LIBS_USER: /usr/local/lib/R/library/, SHELL: /bin/bash,
SHLVL: '2', SSH_AGENT_PID: '1239', SSH_AUTH_SOCK: /tmp/ssh-PdFBbTDs9yE6/agent.1238,
TERM: xterm-256color, USER: arjun, WINDOWID: '62914582', XAUTHORITY: /home/arjun/.Xauthority,
XDG_CURRENT_DESKTOP: i3, XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/arjun, XDG_RUNTIME_DIR: /run/user/1000,
XDG_SEAT: seat0, XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0, XDG_SESSION_DESKTOP: i3,
XDG_SESSION_ID: c2, XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0,
XDG_SESSION_TYPE: x11, XDG_VTNR: '7', XMODIFIERS: '@im=ibus', _: /usr/bin/reprozip}
exitcode: 2
gid: 1000
hostname: arjun-thinkpad
id: run0
system: [Linux, 4.11.9-1-ARCH]
uid: 1000
workingdir: /home/arjun/Core/Code/reproducible_science
# Input and output files
# Inputs are files that are only read by a run; reprounzip can replace these
# files on demand to run the experiment with custom data.
# Outputs are files that are generated by a run; reprounzip can extract these
# files from the experiment on demand, for the user to examine.
# The name field is the identifier the user will use to access these files.
inputs_outputs:
# Files to pack
# All the files below were used by the program; they will be included in the
# generated package
# These files come from packages; we can thus choose not to include them, as it
# will simply be possible to install that package on the destination system
# They are included anyway by default
packages:
# These files do not appear to come with an installed package -- you probably
# want them packed
other_files:
- "/etc/ld.so.cache" # 379.49 KB
- "/home/arjun/Core/Code/reproducible_science" # Directory
- "/lib" # Link to /usr/lib
- "/lib64" # Link to /usr/lib
- "/usr/bin/ping" # 59.73 KB
- "/usr/lib/ld-2.25.so" # 168.83 KB
- "/usr/lib/ld-linux-x86-64.so.2" # Link to /usr/lib/ld-2.25.so
- "/usr/lib/ld-linux.so.2" # Link to /usr/lib32/ld-linux.so.2
- "/usr/lib/libc-2.25.so" # 1.89 MB
- "/usr/lib/libc.so.6" # Link to /usr/lib/libc-2.25.so
- "/usr/lib/libcap.so.2" # Link to /usr/lib/libcap.so.2.25
- "/usr/lib/libcap.so.2.25" # 16.85 KB
- "/usr/lib/libcrypto.so.1.1" # 2.47 MB
- "/usr/lib/libdl-2.25.so" # 14.05 KB
- "/usr/lib/libdl.so.2" # Link to /usr/lib/libdl-2.25.so
- "/usr/lib/libidn.so.11" # Link to /usr/lib/libidn.so.11.6.16
- "/usr/lib/libidn.so.11.6.16" # 206.54 KB
- "/usr/lib/libpthread-2.25.so" # 143.13 KB
- "/usr/lib/libpthread.so.0" # Link to /usr/lib/libpthread-2.25.so
- "/usr/lib/libresolv-2.25.so" # 78.28 KB
- "/usr/lib/libresolv.so.2" # Link to /usr/lib/libresolv-2.25.so
- "/usr/lib/locale/locale-archive" # 1.59 MB
- "/usr/lib32/ld-2.25.so" # 160.73 KB
- "/usr/lib32/ld-linux.so.2" # Link to /usr/lib32/ld-2.25.so
- "/usr/share/locale/locale.alias" # 2.93 KB
# If you want to include additional files in the pack, you can list additional
# patterns of files that will be included
additional_patterns:
# Example:
# - /etc/apache2/** # Everything under apache2/
# - /var/log/apache2/*.log # Log files directly under apache2/
# - /var/lib/lxc/*/rootfs/home/**/*.py # All Python files of all users in
# # that container
Notice that it has a whole bunch of detailed system information. It shows all the system library files that were used to run the ping program. Next you run
reprozip pack myzip.rpz
On another machine, you would install reprounzip, and the setup a virtual machine using
reprounzip vagrant setup myzip.rpz new_directory
reprounzip vagrant run new_directory
reprounzip vagrant upload
reprounzip vagrant run new_directory
The creators claim that computational reproducibility is quite hard and painful to achieve. For example, if a reviewer wanted to recreate an author's computation, it's nearly impossible based simply on the author's manuscript. Reprozip says
For reviewers, even with a compendium in their hands, it may be hard to reproduce the results. There may be no instructions about how to execute the code and explore it further; the experiment may not run on his operating system; there may be missing libraries; library versions may be different; and several issues may arise while trying to install all the required dependencies, a problem colloquially known as dependency hell.
Reprozip is currently being developed at NYU (my alma mater).