Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracts minimal deployment db #146

Closed
wants to merge 114 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
114 commits
Select commit Hold shift + click to select a range
dbf0291
adding communities dataset
Niklewa Jan 10, 2024
20d3b5f
adding communities variables WIP
Niklewa Jan 11, 2024
1edf76a
adding age composition WIP
Niklewa Jan 11, 2024
2d34569
lint
Niklewa Jan 11, 2024
f27ab85
pulling from ru-percentile
Niklewa Jan 11, 2024
ee36c64
linting
Niklewa Jan 11, 2024
0d76ed2
adding processed vars communities
Niklewa Jan 11, 2024
d3edb95
refactoring paths in cleaning files
Niklewa Jan 11, 2024
9b05b6f
linting
Niklewa Jan 11, 2024
56e29b1
running cleaning pipeline
Niklewa Jan 12, 2024
4f29edc
adding data documentation
Niklewa Jan 12, 2024
82493b3
Merge branch 'ru-training-pipeline' of https://github.com/BasisResear…
Niklewa Jan 12, 2024
7345e92
Merge branch 'main' of https://github.com/BasisResearch/cities into n…
rfl-urbaniak Jan 12, 2024
33bd925
rtrol in test_sorted... froom .01 to .02
rfl-urbaniak Jan 12, 2024
ab072c9
Merge pull request #102 from BasisResearch/nl-add-climate-and-economi…
rfl-urbaniak Jan 12, 2024
f7e393b
skipping inference tests for now
rfl-urbaniak Jan 14, 2024
234a68b
suspending inference tests
rfl-urbaniak Jan 14, 2024
38bb662
Merge branch 'ru-vectorize-predictions' of https://github.com/BasisRe…
rfl-urbaniak Jan 14, 2024
c5db754
temporarily moving caual notebooks out of testing until models are re…
rfl-urbaniak Jan 14, 2024
75161bc
seperating burdens
Niklewa Jan 14, 2024
8bcabe0
seperating hazard variable
Niklewa Jan 14, 2024
c63b1a3
lint
Niklewa Jan 14, 2024
abfcf73
adding raw homeownership variables
Niklewa Jan 14, 2024
e416587
added processed variables and documentation
Niklewa Jan 14, 2024
b2a2a6c
formatting exclusions
Niklewa Jan 15, 2024
90d31d3
running clean_gdp
Niklewa Jan 15, 2024
2151f32
taking age raw from 2021
Niklewa Jan 15, 2024
48fdac2
running cleaning pipeline
Niklewa Jan 15, 2024
7be779c
adding raw income_distribution
Niklewa Jan 15, 2024
fc4d609
adding processed income_distribution
Niklewa Jan 15, 2024
11d30fa
adding documentation, lint
Niklewa Jan 15, 2024
f45fc1a
lint
rfl-urbaniak Jan 15, 2024
a1f7c9b
fixed a shape/index related bug
rfl-urbaniak Jan 15, 2024
b08b9f9
format/lint
rfl-urbaniak Jan 15, 2024
2c45fbe
Merge branch 'staging-county-data' of https://github.com/BasisResearc…
rfl-urbaniak Jan 15, 2024
ed368cc
Merge pull request #107 from BasisResearch/nl-add-homeownership
rfl-urbaniak Jan 15, 2024
81075ed
Merge branch 'staging-county-data' of https://github.com/BasisResearc…
rfl-urbaniak Jan 15, 2024
622876c
Merge pull request #106 from BasisResearch/nl-seperating-hazard-burdens
rfl-urbaniak Jan 15, 2024
1135dc8
pulling from staging-county-data
Niklewa Jan 15, 2024
44f218a
updating data sources
Niklewa Jan 15, 2024
70ca773
making all the FIPS codes common
Niklewa Jan 15, 2024
ea53181
explanation WIP
rfl-urbaniak Jan 15, 2024
094e1f9
Merge pull request #108 from BasisResearch/nl-fix-age-add-income-comp…
rfl-urbaniak Jan 15, 2024
dbf4492
population density data
Jan 15, 2024
e3b195f
adding raw homeownership missing counties
Niklewa Jan 15, 2024
2dfedef
adding processed homeownership data
Niklewa Jan 15, 2024
339f515
data sources
Niklewa Jan 15, 2024
edd8f0c
data sources edit
Niklewa Jan 15, 2024
aab1db4
Merge pull request #110 from BasisResearch/nl-fix-missing-homeownersh…
riadas Jan 15, 2024
475b240
Adding population_density to cleaning_pipeline and data_sources
Jan 16, 2024
40cabd3
make format, make lint
Jan 16, 2024
aa582c4
pulling from staging-county-data
Niklewa Jan 16, 2024
5632cf4
running pipeline, fixing tests WIP
Niklewa Jan 16, 2024
1b6448f
fixing pop density problem
Niklewa Jan 16, 2024
c960623
lint, data source updates
Niklewa Jan 16, 2024
39a112a
debugged reversion perturbations
rfl-urbaniak Jan 16, 2024
a704533
lint
rfl-urbaniak Jan 16, 2024
5340549
creating the notebook file WIP
Niklewa Jan 16, 2024
e093d8e
editing the file WIP
Niklewa Jan 16, 2024
61c5377
Merge pull request #109 from BasisResearch/elm_population_density
Niklewa Jan 16, 2024
58c13b7
filling coneptual_overview
Niklewa Jan 16, 2024
2ae2dbf
lint
rfl-urbaniak Jan 16, 2024
bb3ce60
Merge branch 'staging-county-data' of https://github.com/BasisResearc…
rfl-urbaniak Jan 16, 2024
f910ea8
WIP
rfl-urbaniak Jan 16, 2024
38168f5
modifying data sources
Niklewa Jan 17, 2024
1670f45
small modifications
rfl-urbaniak Jan 17, 2024
588806d
editing coneptual overview
Niklewa Jan 17, 2024
cdefed4
editing conceptual overview
Niklewa Jan 17, 2024
19e66fd
Merge pull request #112 from BasisResearch/nl-modify-data-sources
rfl-urbaniak Jan 18, 2024
b6e875b
some revisions
rfl-urbaniak Jan 18, 2024
78621e2
Merge pull request #111 from BasisResearch/nl-add-api-guide
rfl-urbaniak Jan 18, 2024
decce08
Merge branch 'staging-county-data' of https://github.com/BasisResearc…
rfl-urbaniak Jan 18, 2024
2d3fda8
counterfactual explanation WIP
rfl-urbaniak Jan 18, 2024
4a79708
counterfactual-explained first pass
rfl-urbaniak Jan 18, 2024
4274e37
Merge pull request #113 from BasisResearch/ru-describe-inference
rfl-urbaniak Jan 18, 2024
8abaf83
significance -> importance to desc weights; etc
Jan 18, 2024
9953024
minor clarifications to docs
Jan 18, 2024
aee3dc2
added covariate correlations and elimination to data prep
rfl-urbaniak Feb 2, 2024
de787fd
still skipping inference tests
rfl-urbaniak Feb 2, 2024
e7ae05f
models retrained
rfl-urbaniak Feb 5, 2024
197af41
added tau samples and years
rfl-urbaniak Feb 5, 2024
31cb222
tests passed
rfl-urbaniak Feb 5, 2024
0e02645
notebook tests passed
rfl-urbaniak Feb 5, 2024
0cfefe2
Merge pull request #119 from BasisResearch/ru-retrain-models
riadas Feb 17, 2024
5c84ddd
ignore Adam linting error
rfl-urbaniak Mar 12, 2024
e77e779
isort
rfl-urbaniak Mar 12, 2024
dcc9df9
updated tests and linting in ligth of upcoming changes from ru-sql
rfl-urbaniak Mar 13, 2024
2acf01a
update to tests and lints in light of upcoming ru-sql
rfl-urbaniak Mar 13, 2024
84d39b4
removed db tests from workflow
rfl-urbaniak Mar 13, 2024
7c28825
removed redundant embedded pyro repo
rfl-urbaniak Jul 10, 2024
f56efe7
scripts and mechanisms
rfl-urbaniak Aug 5, 2024
cf19274
added model compontents
rfl-urbaniak Aug 5, 2024
7542113
test get_n
rfl-urbaniak Aug 5, 2024
78730b1
tests for linear component
rfl-urbaniak Aug 5, 2024
e48c1d2
test logistic component
rfl-urbaniak Aug 5, 2024
1d4369f
test ratio component
rfl-urbaniak Aug 5, 2024
8d9e0a4
black upgraded
rfl-urbaniak Aug 5, 2024
3c40fa7
remove worktrees
rfl-urbaniak Aug 7, 2024
a954ac0
small revision to clean
rfl-urbaniak Aug 7, 2024
28292e1
clean up experimental notebooks structure
rfl-urbaniak Aug 7, 2024
c8fc150
moved cleaning scripts, test model WIP
rfl-urbaniak Aug 7, 2024
6360371
tracts model test added
rfl-urbaniak Aug 7, 2024
e91f560
added clean path script
rfl-urbaniak Aug 7, 2024
cea1845
renaming model file
rfl-urbaniak Aug 7, 2024
94d29b9
add evaluation, format lint
rfl-urbaniak Aug 7, 2024
541f104
add eval to model test
rfl-urbaniak Aug 7, 2024
975582a
add trained guide and params
rfl-urbaniak Aug 7, 2024
8cae81f
modeling notebook
rfl-urbaniak Aug 7, 2024
d7f2b72
clean notebook
rfl-urbaniak Aug 7, 2024
f7a270d
slim scripts in progress
rfl-urbaniak Aug 19, 2024
5134435
retrained with pyro 1.9.1
rfl-urbaniak Aug 20, 2024
4f1748c
training and prediction for deployment test ready
rfl-urbaniak Aug 20, 2024
6fb6354
format lint
rfl-urbaniak Aug 20, 2024
bdf65a1
switch train_model and predict over to database source
jfeser Aug 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
39 changes: 39 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Lint

on:
push:
branches: [ main ]
pull_request:
branches: [ main, staging-* ]
workflow_dispatch:

jobs:
build:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ['3.10']

steps:
- uses: actions/checkout@v2

- name: pip cache
uses: actions/cache@v1
with:
path: ~/.cache/pip
key: lint-pip-${{ hashFiles('**/pyproject.toml') }}
restore-keys: |
lint-pip-

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[test]

- name: Lint
run: ./scripts/lint.sh
35 changes: 0 additions & 35 deletions .github/workflows/python-app.yml

This file was deleted.

59 changes: 59 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: Test

on:
push:
branches: [ main ]
pull_request:
branches: [ main, staging-* ]
workflow_dispatch:

jobs:
build:
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
python-version: ['3.10']
os: [ubuntu-latest] # , macos-latest]

steps:
- uses: actions/checkout@v2
- name: Ubuntu cache
uses: actions/cache@v1
if: startsWith(matrix.os, 'ubuntu')
with:
path: ~/.cache/pip
key:
${{ matrix.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
restore-keys: |
${{ matrix.os }}-${{ matrix.python-version }}-

- name: macOS cache
uses: actions/cache@v1
if: startsWith(matrix.os, 'macOS')
with:
path: ~/Library/Caches/pip
key:
${{ matrix.os }}-${{ matrix.python-version }}-${{ hashFiles('**/pyproject.toml') }}
restore-keys: |
${{ matrix.os }}-${{ matrix.python-version }}-

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -e .[dev]

# - name: Generate databases
# run: python cities/utils/csv_to_db_pipeline.py

- name: Test
run: python -m pytest tests/

- name: Test Notebooks
run: |
./scripts/test_notebooks.sh
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,3 +21,5 @@ tests/.coverage

*.DS_Store
.vscode/launch.json
data/sql/counties_database.db
data/sql/msa_database.db
16 changes: 16 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Current File",
"type": "python",
"request": "launch",
"program": "${file}",
"console": "integratedTerminal",
"justMyCode": true
}
]
}
6 changes: 6 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
format: FORCE
./scripts/clean.sh


path ?= .

format_path: FORCE
./scripts/clean_path.sh $(path)

lint: FORCE
./scripts/lint.sh

Expand Down
1 change: 1 addition & 0 deletions cities/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
Project short description.
"""

__version__ = "0.0.1"
213 changes: 213 additions & 0 deletions cities/deployment/tracts_minneapolis/predict.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
import copy
import os
import time

import dill
import pandas as pd
import pyro
import torch
from chirho.counterfactual.handlers import MultiWorldCounterfactual

# import chirho
from chirho.interventional.handlers import do
from pyro.infer import Predictive

from cities.modeling.zoning_models.zoning_tracts_model import TractsModel

# can be disposed of once you access data in a different manner
from cities.utils.data_grabber import find_repo_root
from cities.utils.data_loader import load_sql

root = find_repo_root()


#####################
# data load and prep
#####################

kwargs = {
"categorical": ["year", "census_tract"],
"continuous": {
"housing_units",
"total_value",
"median_value",
"mean_limit_original",
"median_distance",
"income",
"segregation_original",
"white_original",
},
"outcome": "housing_units",
}

subset = load_sql(kwargs)
categorical_levels = {
"year": torch.unique(subset["categorical"]["year"]),
"census_tract": torch.unique(subset["categorical"]["census_tract"]),
}

subset_for_preds = copy.deepcopy(subset)
subset_for_preds["continuous"]["housing_units"] = None


########################
# load trained model (run `train_model.py` first)
########################

tracts_model = TractsModel(**subset, categorical_levels=categorical_levels)

pyro.clear_param_store()

guide_path = "tracts_model_guide.pkl"
param_path = "tracts_model_params.pth"

with open(guide_path, "rb") as file:
guide = dill.load(file)

pyro.get_param_store().load(param_path)

predictive = Predictive(
model=tracts_model,
guide=guide,
num_samples=100,
)


############################################################
# define interventions parametrized as in the intended query
############################################################


# these are at the parcel level
def values_intervention(
radius_blue, limit_blue, radius_yellow, limit_yellow, reform_year=2015
):

# don't want to load large data multiple times
# note we'll need to generate these datasets anew once we switch to the new data pipeline

if not hasattr(values_intervention, "global_census_ids"):
values_intervention.global_census_ids = pd.read_csv(
os.path.join(root, "data/minneapolis/processed/census_ids.csv")
)

values_intervention.global_data = pd.read_csv(
os.path.join(
root,
"data/minneapolis/processed/census_tract_intervention_required.csv",
)
)

data = values_intervention.global_data
census_ids = values_intervention.global_census_ids
values_intervention.global_data = data[
(data["census_tract"].isin(census_ids["census_tract"]))
& (data["year"].isin(census_ids["year"]))
]

data = values_intervention.global_data.copy()

intervention = copy.deepcopy(values_intervention.global_data["limit_con"])
downtown = data["downtown_yn"]
new_blue = (
(~downtown)
& (data["year"] >= reform_year)
& (data["distance_to_transit"] <= radius_blue)
)
new_yellow = (
(~downtown)
& (data["year"] >= reform_year)
& (data["distance_to_transit"] > radius_blue)
& (data["distance_to_transit"] <= radius_yellow)
)
new_other = (
(~downtown)
& (data["year"] > reform_year)
& (data["distance_to_transit"] > radius_yellow)
)

intervention[downtown] = 0.0
intervention[new_blue] = limit_blue
intervention[new_yellow] = limit_yellow
intervention[new_other] = 1.0

data["intervention"] = intervention

return data


# generate three interventions at the parcel level

start = time.time()
simple_intervention = values_intervention(300, 0.5, 700, 0.7, reform_year=2015)
end = time.time()
print("Time to run values_intervention 1: ", end - start)
start2 = time.time()
simple_intervention2 = values_intervention(400, 0.5, 800, 0.6, reform_year=2013)
end2 = time.time()
print("Time to run values_intervention 2: ", end2 - start2)
start3 = time.time()
simple_intervention3 = values_intervention(200, 0.4, 1000, 0.65, reform_year=2013)
end3 = time.time()
print("Time to run values_intervention 3: ", end3 - start3)


# these are at the tracts level


def tracts_intervention(
radius_blue, limit_blue, radius_yellow, limit_yellow, reform_year=2015
):

parcel_interventions = values_intervention(
radius_blue, limit_blue, radius_yellow, limit_yellow, reform_year=reform_year
)

aggregate = (
parcel_interventions[["census_tract", "year", "intervention"]]
.groupby(["census_tract", "year"])
.mean()
.reset_index()
)

if not hasattr(tracts_intervention, "global_census_ids"):

tracts_intervention.global_valid_pairs = set(
zip(
values_intervention.global_census_ids["census_tract"],
values_intervention.global_census_ids["year"],
)
)

subaggregate = aggregate[
aggregate[["census_tract", "year"]]
.apply(tuple, axis=1)
.isin(tracts_intervention.global_valid_pairs)
].copy()

return torch.tensor(list(subaggregate["intervention"]))


# generate two interventions at the tracts level

start = time.time()
t_intervention = tracts_intervention(300, 0.5, 700, 0.7, reform_year=2015)
end = time.time()
print("Time to run tracts_intervention 1: ", end - start)

start2 = time.time()
t_intervention2 = tracts_intervention(400, 0.5, 800, 0.6, reform_year=2013)
end2 = time.time()
print("Time to run tracts_intervention 2: ", end2 - start2)


##################################
# use interventions with the model
##################################

with MultiWorldCounterfactual() as mwc:
with do(actions={"limit": torch.tensor(0.0)}):
samples = predictive(**subset_for_preds)


assert samples["limit"].shape[:-1] == torch.Size([100, 2, 1, 1, 1])
Loading
Loading