Skip to content

Latest commit

 

History

History
136 lines (111 loc) · 8.23 KB

notes-reprozip.md

File metadata and controls

136 lines (111 loc) · 8.23 KB

What is it?

Reprozip "zips up" all the file required by your experiment into one package so that other researchers can recreate your experiment. So what is an experiment?

An experiment is a program like a compiled python, R or matlab file or a script. Reprozip tracks what the program does, what system functions it calls and what files and libraries it uses! This way, reprozip knows exactly what your experiment or program did. So it packages all these system utilies, libraries and programs into one little Reprozip container.

You can now unpackage the Reprozip container into a Docker container or a Vagrant virtual machine. Then you simply rerun the program/experiment!

The difference between this and Vistrails is that Vistrails produces a reproducible workflow to create a visualization, but Reprozip replicates all the system components that were needed in your experiment!

How it works

sudo pip install reprozip

Then I ran a simple command that pings google's server

ping google.com

Then reprozip created the following configuration file.

# Run info
version: "0.8"
runs:
# Run 0
- architecture: x86_64
  argv: [ping, google.com]
  binary: /usr/bin/ping
  distribution: ['', '']
  environ: {ANT_HOME: /usr/share/apache-ant, COLORFGBG: 15;default, COLORTERM: rxvt,
    DBUS_SESSION_BUS_ADDRESS: 'unix:path=/run/user/1000/bus', DESKTOP_SESSION: i3,
    DESKTOP_STARTUP_ID: i3/urxvt/1155-126-arjun-thinkpad_TIME101258575, DISPLAY: ':0',
    EDITOR: vim, FREETYPE_PROPERTIES: 'truetype:interpreter-version=40', GDMSESSION: i3,
    GDM_LANG: en_US.utf8, GOOGLE_DRIVE_SETTINGS: /home/arjun/.duplicity/credentials,
    GOPATH: /usr/local/share/gopath/, GPGKEY: C92771D0, GPGKEYNOPASSPHRASE: 01A97666,
    GTK_IM_MODULE: ibus, GTK_MODULES: canberra-gtk-module, HOME: /home/arjun/, LANG: en_US.utf8,
    LOGNAME: arjun, LS_COLORS: 'rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;47:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:',
    MAIL: /var/spool/mail/arjun, MOZ_PLUGIN_PATH: /usr/lib/mozilla/plugins, PATH: '/home/arjun//bin/misc_scripts:/home/arjun//bin:/home/arjun//bin/misc_scripts:/home/arjun//bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/home/arjun/.gem/ruby/2.3.0/bin',
    PWD: /home/arjun/Core/Code/reproducible_science, PYTHONSTARTUP: /home/arjun//.pythonrc,
    QT_IM_MODULE: ibus, R_LIBS_USER: /usr/local/lib/R/library/, SHELL: /bin/bash,
    SHLVL: '2', SSH_AGENT_PID: '1239', SSH_AUTH_SOCK: /tmp/ssh-PdFBbTDs9yE6/agent.1238,
    TERM: xterm-256color, USER: arjun, WINDOWID: '62914582', XAUTHORITY: /home/arjun/.Xauthority,
    XDG_CURRENT_DESKTOP: i3, XDG_GREETER_DATA_DIR: /var/lib/lightdm-data/arjun, XDG_RUNTIME_DIR: /run/user/1000,
    XDG_SEAT: seat0, XDG_SEAT_PATH: /org/freedesktop/DisplayManager/Seat0, XDG_SESSION_DESKTOP: i3,
    XDG_SESSION_ID: c2, XDG_SESSION_PATH: /org/freedesktop/DisplayManager/Session0,
    XDG_SESSION_TYPE: x11, XDG_VTNR: '7', XMODIFIERS: '@im=ibus', _: /usr/bin/reprozip}
  exitcode: 2
  gid: 1000
  hostname: arjun-thinkpad
  id: run0
  system: [Linux, 4.11.9-1-ARCH]
  uid: 1000
  workingdir: /home/arjun/Core/Code/reproducible_science

# Input and output files

# Inputs are files that are only read by a run; reprounzip can replace these
# files on demand to run the experiment with custom data.
# Outputs are files that are generated by a run; reprounzip can extract these
# files from the experiment on demand, for the user to examine.
# The name field is the identifier the user will use to access these files.
inputs_outputs:

# Files to pack
# All the files below were used by the program; they will be included in the
# generated package

# These files come from packages; we can thus choose not to include them, as it
# will simply be possible to install that package on the destination system
# They are included anyway by default
    packages:

# These files do not appear to come with an installed package -- you probably
# want them packed
other_files:
  - "/etc/ld.so.cache" # 379.49 KB
  - "/home/arjun/Core/Code/reproducible_science" # Directory
  - "/lib" # Link to /usr/lib
  - "/lib64" # Link to /usr/lib
  - "/usr/bin/ping" # 59.73 KB
  - "/usr/lib/ld-2.25.so" # 168.83 KB
  - "/usr/lib/ld-linux-x86-64.so.2" # Link to /usr/lib/ld-2.25.so
  - "/usr/lib/ld-linux.so.2" # Link to /usr/lib32/ld-linux.so.2
  - "/usr/lib/libc-2.25.so" # 1.89 MB
  - "/usr/lib/libc.so.6" # Link to /usr/lib/libc-2.25.so
  - "/usr/lib/libcap.so.2" # Link to /usr/lib/libcap.so.2.25
  - "/usr/lib/libcap.so.2.25" # 16.85 KB
  - "/usr/lib/libcrypto.so.1.1" # 2.47 MB
  - "/usr/lib/libdl-2.25.so" # 14.05 KB
  - "/usr/lib/libdl.so.2" # Link to /usr/lib/libdl-2.25.so
  - "/usr/lib/libidn.so.11" # Link to /usr/lib/libidn.so.11.6.16
  - "/usr/lib/libidn.so.11.6.16" # 206.54 KB
  - "/usr/lib/libpthread-2.25.so" # 143.13 KB
  - "/usr/lib/libpthread.so.0" # Link to /usr/lib/libpthread-2.25.so
  - "/usr/lib/libresolv-2.25.so" # 78.28 KB
  - "/usr/lib/libresolv.so.2" # Link to /usr/lib/libresolv-2.25.so
  - "/usr/lib/locale/locale-archive" # 1.59 MB
  - "/usr/lib32/ld-2.25.so" # 160.73 KB
  - "/usr/lib32/ld-linux.so.2" # Link to /usr/lib32/ld-2.25.so
  - "/usr/share/locale/locale.alias" # 2.93 KB

# If you want to include additional files in the pack, you can list additional
# patterns of files that will be included
    additional_patterns:
# Example:
#  - /etc/apache2/**  # Everything under apache2/
#  - /var/log/apache2/*.log  # Log files directly under apache2/
#  - /var/lib/lxc/*/rootfs/home/**/*.py  # All Python files of all users in
#    # that container

Notice that it has a whole bunch of detailed system information. It shows all the system library files that were used to run the ping program. Next you run

reprozip pack myzip.rpz

On another machine, you would install reprounzip, and the setup a virtual machine using

reprounzip vagrant setup myzip.rpz new_directory
reprounzip vagrant run new_directory
reprounzip vagrant upload
reprounzip vagrant run new_directory

The creators claim that computational reproducibility is quite hard and painful to achieve. For example, if a reviewer wanted to recreate an author's computation, it's nearly impossible based simply on the author's manuscript. Reprozip says

For reviewers, even with a compendium in their hands, it may be hard to reproduce the results. There may be no instructions about how to execute the code and explore it further; the experiment may not run on his operating system; there may be missing libraries; library versions may be different; and several issues may arise while trying to install all the required dependencies, a problem colloquially known as dependency hell.

Reprozip is currently being developed at NYU (my alma mater).