Skip to content

Commit f63da68

Browse files
committed
first commit
0 parents  commit f63da68

File tree

117 files changed

+7213
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

117 files changed

+7213
-0
lines changed

.gitignore

+15
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
*.csv
2+
#*.pkl
3+
/dev/
4+
*fitted_model*.pkl
5+
*stan_model*.pkl
6+
/example_5d/*/i_*
7+
/example_3d/*/i_*
8+
*summary.txt
9+
!*df_actionability_score.csv
10+
!*distance_summary.txt
11+
!*df_X.csv
12+
!*df_y.csv
13+
!*df_var.csv
14+
.ipynb_checkpoints/
15+
__pycache__

LICENSE

+11
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
LICENSE AGREEMENT OF THE Actionable path planning SYSTEM
2+
3+
Copyright (c) 2021,
4+
Kazuki Nakamura
5+
All rights reserved.
6+
7+
Permission is hereby granted, free of charge, to the person who uses this software and associated documentation files (the "Software") for evaluation, non-profit academic research purposes, and the registered projects/users only ("the User"). The registered projects/users contains profit and non-profit users and projects who obtain the temporal permission from the author or copyright holder. The User has rights to use, copy, modify, merge, publish, and/or distribute copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the conditions described above.
8+
9+
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
10+
11+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

README.md

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Actionable path planning
2+
The aim of this study is to plan an actionable path to change predicted values of a specified ML model.
3+
The license information and tutorials are described below.
4+
5+
# Tutorials
6+
## Setting conda environment
7+
Requirements:
8+
CentOS Linux release 8.1 (Confirmed)
9+
```bash
10+
conda config --add channels conda-forge
11+
conda create --name actionable_path_planning python=3.7.3
12+
conda activate actionable_path_planning
13+
conda install pandas=0.25.3=py37hb3f55d8_0
14+
conda install numpy=1.16.0=py37h99e49ec_1
15+
conda install scipy=1.3.2=py37h921218d_0
16+
conda install xgboost=0.82=py37he1b5a44_1
17+
conda install scikit-learn=0.21.2=py37hcdab131_1
18+
conda install matplotlib=3.1.1=py37_0
19+
conda install pystan=2.19.1.1=py37hb3f55d8_1
20+
```
21+
22+
## For testing: Synthetic dataset generation
23+
Run the following commands:
24+
```bash
25+
python synthetic_data_generation.py --dataset 5d --save_dir example_5d
26+
```
27+
- dataset: type of synthetic dataset. choices=[5d, 3d]
28+
- save_dir: save directory name
29+
30+
## Fitting ML model
31+
```bash
32+
python preprocessing.py --load_dir example_5d --model_name XG --model_type regressor
33+
python fitting_ml_model.py --load_dir example_5d
34+
```
35+
- load_dir: load directory name, which should contains `df_X.csv`, `df_y.csv`, `df_var.csv` in `data` subdirectory
36+
-- df_var.csv: including `item_name_other` and `item_type` information for each variable. `item_type` choices=[continuous, nominal]
37+
- model_name: ML algorithm. choices=[XG: XGBoost, RF: RandomForest, SVM: SupportVectorMachine]
38+
- model_type: choices=[regressor, classifier]
39+
40+
## Stochastic surrogate modeling
41+
```bash
42+
python surrogate_modeling.py --load_dir example_5d --sigma_y 1.27 --stan_template template_for_wbic.stan
43+
python surrogate_modeling_result_postprocessing.py --load_dir example_5d
44+
```
45+
- load_dir: load directory name
46+
- sigma_y: modeling hyperparameter. Recommended setting is rmse/2.
47+
- stan_template: `template_for_wbic_classifier.stan` should be selected for classification task
48+
49+
The number of mixture components in hierarchical Bayesian model with the lowest WBIC would be output.
50+
51+
## Path planning
52+
```bash
53+
python selection_for_path_planning.py --load_dir example_5d --mixture_components 2
54+
python path_planning.py --load_dir example_5d --mixture_components 2 --num_movable 5
55+
python calc_actionability_score.py --load_dir example_5d --mixture_components 2 --num_movable 5
56+
```
57+
- load_dir: load directory name
58+
- path_planning_index: indication of instance index to plan path. In default setting, all instances w/o outlier.
59+
- intervention_variables: indication of explanatory variables selected as intervention variables. In default setting, all explanatory variables.
60+
- mixture_components: the number of mixture components of hierarchical Bayesian model to be used for path planning.
61+
- destination_state: search end condition. choices=[count, criteria(Not supported)]
62+
- destination: if `destination_state` is count, search iteration count, elif criteria, y value to be achieved.
63+
- step: unit change in intervention variables. Default setting is 0.5-sigma of training dataset.
64+
- upper_is_better: increased y state would be better in search setting or not.
65+
- num_movable: number of intervention variables to distinguish loading directory.
66+
67+
The output would contains the planned path for each instance and the actionability score.
68+
69+
# License
70+
A patent is pending.
71+
This edition of Actionable path planning is for evaluation, learning, and non-profit academic research purposes only, and a license is needed for any other uses. Please send requests on license or questions to nakamura.kazuki.88m[at]st.kyoto-u.ac.jp

calc_actionability_score.py

+167
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
#!/usr/bin/env python
2+
# coding: utf-8
3+
import argparse
4+
import datetime
5+
import os
6+
os.environ["OMP_NUM_THREADS"] = "8"
7+
8+
import pickle
9+
import random
10+
11+
from math import inf as infinity
12+
import numpy as np
13+
import pandas as pd
14+
15+
import path_search_algorithm as p
16+
import utils
17+
18+
def get_parser():
19+
parser = argparse.ArgumentParser(description='help')
20+
parser.add_argument('--load_dir', default='example',
21+
help='load directory')
22+
parser.add_argument('--mixture_components', default=2, type=int,
23+
help='number of mixture components of surrogate model used for path planning')
24+
parser.add_argument('-m', '--num_movable', default=5, type=int,
25+
help='number of intervention variables')
26+
parser.add_argument('-y', '--y_type', default='pred', choices=['pred', 'glmm'],
27+
help='y used in path planning')
28+
parser.add_argument('-ds', '--destination_state', default='count',
29+
choices=['criteria', 'count'],
30+
help='Destination state selection')
31+
parser.add_argument('-d', '--daystamp', default='',
32+
help='If blank, latest one would be selected')
33+
parser.add_argument('-u', '--update_flag', default=True,
34+
type=lambda x: (str(x).lower() == 'true'),
35+
help='If True, p_class would be updated')
36+
parser.add_argument('-i', '--num_iter', default=10, type=int,
37+
help='number of baseline paths used for the calculation of actionability score')
38+
parser.add_argument('--random_seed', default=0, type=int,
39+
help='baseline path selection seed')
40+
parser.add_argument('--n_bins', default=20, type=int,
41+
help='number of bins for plot')
42+
return parser.parse_args()
43+
44+
def select_load_dir(load_dir, k, num_movable, y_type, destination_state, daystamp):
45+
'''
46+
Args:
47+
load_dir (str):
48+
k (str): num of hb_class
49+
num_movable (int): number of movable values in tree search
50+
y_type (str): choices are [pred, glmm]
51+
destination_state (str): choices are [node, criteria, count]
52+
daystamp (str): daystamp of load_dir (and save_dir)
53+
Returns:
54+
load_dir (str): load_params dir (and save_dir)
55+
'''
56+
model_dir = f'./{load_dir}/'
57+
if daystamp=='':
58+
all_files = os.listdir(model_dir)
59+
folders = sorted([f for f in all_files if f.startswith(f'k{k}m{num_movable}_y_{y_type}_ds_{destination_state}')])
60+
dir_name = folders[-1]
61+
else:
62+
dir_name = f'k{k}m{num_movable}_y_{y_type}_ds_{destination_state}_{daystamp}'
63+
load_dir = model_dir + dir_name + '/'
64+
return load_dir
65+
66+
def main():
67+
args = get_parser()
68+
random.seed(args.random_seed)
69+
load_dir = './' + args.load_dir + '/'
70+
data_dir = load_dir + 'data/'
71+
result_dir = load_dir + 'result/ex/'
72+
ts_dir = select_load_dir(args.load_dir, args.mixture_components, args.num_movable, args.y_type,
73+
args.destination_state, args.daystamp)
74+
75+
print('[INFO] Loading')
76+
xy_for_exp = pickle.load(open(data_dir + 'xy_for_stan.pkl', 'rb'))
77+
dict_emp_bayes = pickle.load(open(result_dir + f'dict_emp_bayes_{args.mixture_components}.pkl', 'rb'))
78+
params = pickle.load(open(ts_dir + 'params.pkl', 'rb'))
79+
model_type = pickle.load(open(data_dir + 'model_type.pkl', 'rb'))
80+
print('[INFO] Done')
81+
82+
df_result = pd.DataFrame(columns=['min_distance'] + [f'rand_distance_{i}' for i in range(args.num_iter)],
83+
index=params['list_pp_i_idx'])
84+
list_detour = []
85+
list_line = []
86+
list_noresult = []
87+
88+
for pp_i in params['list_pp_i_idx']:
89+
pp_dir = ts_dir + f'i_{pp_i}/'
90+
my_check = not os.path.isdir(pp_dir)
91+
if my_check:
92+
print(f'[INFO] pp_i: {pp_i} has no results.')
93+
list_noresult.append(pp_i)
94+
continue
95+
96+
destination_node = pickle.load(open(pp_dir+'destination_node.pkl', 'rb'))
97+
summary = open(pp_dir+'summary.txt').read()
98+
num_step = len([line for line in summary.splitlines() if 'step' in line]) -1
99+
df_result.loc[pp_i, 'min_distance'] = destination_node.tentative_distance
100+
101+
if num_step==abs(np.array(destination_node.r_x)).max():
102+
list_line.append(pp_i)
103+
elif num_step==abs(np.array(destination_node.r_x)).sum():
104+
pass
105+
else:
106+
list_detour.append(pp_i)
107+
108+
destination_abs = abs(np.array(destination_node.r_x))
109+
list_move = []
110+
for j in range(len(destination_abs)):
111+
for _ in range(destination_abs[j]):
112+
np_move = np.zeros(len(destination_abs), dtype=int)
113+
if destination_node.r_x[j] > 0:
114+
np_move[j] = 1
115+
else:
116+
np_move[j] = -1
117+
list_move.append(tuple(np_move))
118+
for k in range(args.num_iter):
119+
random.shuffle(list_move)
120+
initial_x_coords = p.InitialXCoords(xy_for_exp, pp_i)
121+
initial_r_x = tuple(np.zeros(len(xy_for_exp.x_reg.name), dtype=int))
122+
initial_node = p.Node(initial_x_coords, initial_r_x)
123+
initial_node.x_fixed.set_fixed(initial_node,
124+
fixed_cont=params['fixed_cont'],
125+
fixed_disc_reg=params['fixed_disc_reg'],
126+
fixed_disc=params['fixed_disc'],
127+
fixed_zero_poi=params['fixed_zero_poi'])
128+
p_class = initial_node.calc_p_class(args.update_flag, dict_emp_bayes)
129+
initial_node.set_y_and_class_lp(xy_for_exp.model, params['y_type'],
130+
params['k_class'], xy_for_exp,
131+
dict_emp_bayes, params['sigma_y'],
132+
p_class, model_type=model_type)
133+
initial_node.tentative_distance = 0
134+
current_node = initial_node
135+
for i in range(len(list_move)):
136+
n_r_x = tuple(np.array(current_node.r_x) + np.array(list_move[i]))
137+
n_node = p.Node(initial_node.x, n_r_x, params['step'])
138+
n_node.set_y_and_class_lp(xy_for_exp.model, params['y_type'], params['k_class'],
139+
xy_for_exp, dict_emp_bayes, params['sigma_y'], p_class, model_type=model_type)
140+
n_node.set_neg_logprob()
141+
new_tentative_distance = current_node.tentative_distance + current_node.distance_to(n_node)
142+
n_node.tentative_distance = new_tentative_distance
143+
current_node = n_node
144+
# brief check
145+
if destination_node.r_x!=current_node.r_x:
146+
print('[ERROR] arrived r_x is different from destination.')
147+
df_result.loc[pp_i, f'rand_distance_{k}'] = current_node.tentative_distance
148+
149+
# calc actionability score
150+
dif_distance = utils.calc_dif_distance(df_result)
151+
# plot actionability score
152+
utils.plot_dif_distance(dif_distance, ts_dir, args.n_bins, 0)
153+
df_actionability_score = pd.DataFrame(dif_distance, columns=['Actionability score'], index=df_result.index)
154+
# output actionability score
155+
df_actionability_score.to_csv(ts_dir + 'df_actionability_score.csv')
156+
157+
# output_summary
158+
with open(ts_dir+'distance_summary.txt', mode='w') as f:
159+
f.write('Path-planned instances: {}\n'.format(len(params['list_pp_i_idx'])))
160+
f.write('Detour instances: {}\n'.format(len(list_detour)))
161+
f.write('Straight instances: {}\n'.format(len(list_line)))
162+
f.write('Initial and destination consistent instances: {}\n'.format(len(list_noresult)))
163+
f.close()
164+
165+
if __name__ == '__main__':
166+
main()
167+

example_3d/args/args_02.pkl

215 Bytes
Binary file not shown.

example_3d/data/cv.pkl

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
�K.

0 commit comments

Comments
 (0)