Skip to content

Commit

Permalink
updated reademe
Browse files Browse the repository at this point in the history
  • Loading branch information
nathanwang000 committed Feb 19, 2021
1 parent 0096ff2 commit 6f01b7c
Show file tree
Hide file tree
Showing 4 changed files with 1,322 additions and 239 deletions.
92 changes: 25 additions & 67 deletions README.org
Original file line number Diff line number Diff line change
@@ -1,69 +1,27 @@
* MSR intern 2020
* Shapley Flow

Shapley value flow
This repository contains implementation for the AISTATS 2021 paper
"[[https://arxiv.org/pdf/2010.14592.pdf][Shapley Flow: A Graph-based Approach to Interpreting Model Predictions]]".

Sample experiments in [[./flow_synthetic_experiments.ipynb][notebook]]

** versions

version 0: once a node's value is changed, make it visible to the output

version 1: only expose a node's new value if the edge to the target is
opened. This requires each node to keep track of its argument's value.
Difference from version 0 is in dfs part.

** installation instructions

We require the master version of the package shap placed in the directory to
run the notebooks.

There has been issues installing pygraphviz. First make sure graphviz is
installed. Then make sure you installed python-dev for the specific python
version you are using:

#+BEGIN_SRC bash
apt-get install pythonX.X-dev # e.g python3.8-dev; I'm using python3.8
#+END_SRC

Then one should be able to install pygraphviz with

#+BEGIN_SRC bash
pip install pygraphviz # or "pip install --install-option="--include-path=/usr/local/include/" --install-option="--library-path=/usr/local/lib/" pygraphviz" as noted in https://github.com/pygraphviz/pygraphviz/issues/100
#+END_SRC

** sanity check graphs

The [[./sanity_check_graphs.ipynb][notebook]] contains sanity check graphs for various flow approaches.

** python version

requires python 3.8.2

requirements are listed in [[./Pipefile][pipefile]]

** development notes

1. schools server: mld3 seems very slow to run xgboost models, but
explanation seems to run fine. This is not a huge issue but worth
investigating why

The solution is to limit the amount of numpy threads. The following script
would do the trick:

#+BEGIN_SRC python
import os # need to happen before loading numpy
os.environ["OMP_NUM_THREADS"] = "8" # export OMP_NUM_THREADS=8
os.environ["OPENBLAS_NUM_THREADS"] = "8" # export OPENBLAS_NUM_THREADS=8
os.environ["MKL_NUM_THREADS"] = "8" # export MKL_NUM_THREADS=8
os.environ["VECLIB_MAXIMUM_THREADS"] = "8" # export VECLIB_MAXIMUM_THREADS=8
os.environ["NUMEXPR_NUM_THREADS"] = "8" # export NUMEXPR_NUM_THREADS=8
#+END_SRC


2. the multiprocessing setup

un-comment the following line in flow.py for multi-processing code

#+BEGIN_SRC bash
# multiprocessing_setup() # this is needed for multiprocessing
#+END_SRC
The directory is organized as the following. Files in the current directory
include implementations of the algorithm and baselines.

#+BEGIN_VERSE
flow.py: Implementation of the Shapley Flow algorithm
on_manifold.py: Implementation of the on-manifold SHAP baseline
linear_evaluation.py: Evaluation code for paper Section 4.3
#+END_VERSE

~notebooks/~ contains case studies and experiments for the paper

#+BEGIN_VERSE
[[./notebook/tutorial.ipynb][notebook/tutorial.ipynb]]: Tutorial for Shapley Flow
[[./notebook/synthetic_sanity_checks.ipynb][notebook/synthetic_sanity_checks.ipynb]]: Sanity check examples for Section 4.3
notebooks/linear_nutrition.ipynb: Experiments with the nutrition dataset for sanity check with linear model in Section 4.3
notebooks/linear_income.ipynb: Experiments with the adult censor income dataset for sanity check with linear model
notebooks/nutrition.ipynb: Case study of the nutrition dataset in Section 4.4
notebooks/income.ipynb: Case study of the adult censor income dataset in the Appendix
notebooks/nutrition_CI.ipynb: Case study of the nutrition dataset with multiple baseline and 95% confidence interval
#+END_VERSE

~archive/~ include note and experiments for previous iterations of the project.
69 changes: 69 additions & 0 deletions archive/old_readme.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
* Shapley Flow

Shapley value flow

Sample experiments in [[./flow_synthetic_experiments.ipynb][notebook]]

** versions

version 0: once a node's value is changed, make it visible to the output

version 1: only expose a node's new value if the edge to the target is
opened. This requires each node to keep track of its argument's value.
Difference from version 0 is in dfs part.

** installation instructions

We require the master version of the package shap placed in the directory to
run the notebooks.

There has been issues installing pygraphviz. First make sure graphviz is
installed. Then make sure you installed python-dev for the specific python
version you are using:

#+BEGIN_SRC bash
apt-get install pythonX.X-dev # e.g python3.8-dev; I'm using python3.8
#+END_SRC

Then one should be able to install pygraphviz with

#+BEGIN_SRC bash
pip install pygraphviz # or "pip install --install-option="--include-path=/usr/local/include/" --install-option="--library-path=/usr/local/lib/" pygraphviz" as noted in https://github.com/pygraphviz/pygraphviz/issues/100
#+END_SRC

** sanity check graphs

The [[./sanity_check_graphs.ipynb][notebook]] contains sanity check graphs for various flow approaches.

** python version

requires python 3.8.2

requirements are listed in [[./Pipefile][pipefile]]

** development notes

1. schools server: mld3 seems very slow to run xgboost models, but
explanation seems to run fine. This is not a huge issue but worth
investigating why

The solution is to limit the amount of numpy threads. The following script
would do the trick:

#+BEGIN_SRC python
import os # need to happen before loading numpy
os.environ["OMP_NUM_THREADS"] = "8" # export OMP_NUM_THREADS=8
os.environ["OPENBLAS_NUM_THREADS"] = "8" # export OPENBLAS_NUM_THREADS=8
os.environ["MKL_NUM_THREADS"] = "8" # export MKL_NUM_THREADS=8
os.environ["VECLIB_MAXIMUM_THREADS"] = "8" # export VECLIB_MAXIMUM_THREADS=8
os.environ["NUMEXPR_NUM_THREADS"] = "8" # export NUMEXPR_NUM_THREADS=8
#+END_SRC


2. the multiprocessing setup

un-comment the following line in flow.py for multi-processing code

#+BEGIN_SRC bash
# multiprocessing_setup() # this is needed for multiprocessing
#+END_SRC
Loading

0 comments on commit 6f01b7c

Please sign in to comment.