diff --git a/.env b/.env new file mode 100644 index 00000000..f143c46d --- /dev/null +++ b/.env @@ -0,0 +1,6 @@ +GOOGLE_CLOUD_PROJECT=cities-429602 +GOOGLE_CLOUD_BUCKET=minneapolis-basis +SCHEMA=minneapolis +HOST=34.123.100.76 +DATABASE=cities +USERNAME=postgres diff --git a/.gitignore b/.gitignore index 89fa2675..bbeb945f 100644 --- a/.gitignore +++ b/.gitignore @@ -23,3 +23,16 @@ tests/.coverage .vscode/launch.json data/sql/counties_database.db data/sql/msa_database.db +.Rproj.user +**/*.RData +**/*.Rhistory + +# data +data/minneapolis/processed/values_long.csv +data/minneapolis/processed/values_with_parking.csv +data/minneapolis/sourced/demographic/** +data/minneapolis/preds/** +data/minneapolis/sourced/parcel_to_census_tract_mappings/** +data/minneapolis/sourced/parcel_to_parking_info_mappings/** + +data/minneapolis/.pgpass diff --git a/README.md b/README.md index 2a5b125c..4dfa19a9 100644 --- a/README.md +++ b/README.md @@ -3,19 +3,52 @@
-## Evaluating Policy Transfer via Similarity Analysis and Causal Inference +# Evaluating Policy Transfer via Similarity Analysis and Causal Inference + + +## Getting started + + +Welcome to the repository for [polis](http://polis.basis.ai/), developed by [Basis Research Institute](https://www.basis.ai/) for [The Opportunity Project (TOP)](https://opportunity.census.gov/) 2023 in collaboration with the U.S. Department of Commerce. The primary goal of this project is to enhance access to data for local policymakers, facilitating more informed decision-making. + +This is the backend repository for more advanced users. For a more pleasant frontend experience and more information, please use the [app](http://polis.basis.ai/). + + +Installation +------------ + +**Basic Setup:** + +```sh + + git clone git@github.com:BasisResearch/cities.git + cd cities + git checkout main + pip install . ``` -python -m venv venv -source venv/bin/activate -pip install -r requirements.txt -pip install -e . -cd tests && python -m pytest + +The above will install the minimal version that's ported to [polis.basis.ai](http://polis.basis.ai) + +**Dev Setup:** + +To install dev dependencies, needed to run models, train models and run all the tests, run the following command: + +```sh +pip install -e .[dev] ``` +Details of which packages are available in which see `setup.py`. -Welcome to the repository for [polis](http://polis.basis.ai/), developed by the [Basis Research Institute](https://www.basis.ai/) for [The Opportunity Project (TOP)](https://opportunity.census.gov/) 2023 in collaboration with the U.S. Department of Commerce. The primary goal of this project is to enhance access to data for local policymakers, facilitating more informed decision-making. -This is the backend repository for more advanced users. For a more pleasant frontend experience and more information, please use the [app](http://polis.basis.ai/). +** Contributing: ** + +Before submitting a pull request, please autoformat code and ensure that unit tests pass locally + +```sh +make lint # linting +make format # runs black and isort, including on notebooks in the docs/ folder +make tests # linting, unit and notebook tests +``` ### The repository is structured as follows: @@ -36,11 +69,24 @@ This is the backend repository for more advanced users. For a more pleasant fron └── tests ``` +**WARNING: during the beta testing, the most recent version lives on the `staging-county-data` git branch, and so do the most recent versions of the notebooks. Please switch to this branch before inspecting the notebooks. If you're interested in downloading the data or exploring advanced features beyond the frontend, check out the `guides` folder in the `docs` directory. There, you'll find: - `data_sources.ipynb` for information on data sources, +- `similarity-conceptual.ipynb` for a conceptual account of how similarity comparison works. +- `counterfactual-explained.ipynb` contains a rough explanation of how our causal model works. - `similarity_demo.ipynb` demonstrating the use of the `DataGrabber` class for easy data acces, and of our `FipsQuery` class, which is the key tool in the similarity-focused part of the project, - `causal_insights_demo.ipynb` for an overview of how the `CausalInsight` class can be used to explore the influence of a range of intervention variables thanks to causal inference tools we employed. [WIP] -Feel free to dive into these resources to gain deeper insights into the capabilities of the Polis project, or to reach out if you have any comments or suggestions. +## Interested? We'd love to hear from you. + +[polis](http://polis.basis.ai/) is a research tool under very active development, and we are eager to hear feedback from users in the policymaking and public administration spaces to accelerate its benefit. + +If you have feature requests, recommendations for new data sources, tips for how to resolve missing data issues, find bugs in the tool (they certainly exist!), or anything else, please do not hesitate to contact us at polis@basis.ai. + +To stay up to date on our latest features, you can subscribe to our [mailing list](https://dashboard.mailerlite.com/forms/102625/110535550672308121/share). In the near-term, we will send out a notice about our upcoming batch of improvements (including performance speedups, support for mobile, and more comprehensive tutorials), as well as an interest form for users who would like to work closely with us on case studies to make the tool most useful in their work. + +Lastly, we emphasize that this website is still in beta testing, and hence all predictions should be taken with a grain of salt. + +Acknowledgments: polis was built by Basis, a non-profit AI research organization dedicated to creating automated reasoning technology that helps solve society's most intractable problems. To learn more about us, visit https://basis.ai. diff --git a/cities/modeling/model_interactions.py b/cities/modeling/model_interactions.py index 8232410f..2446d6d5 100644 --- a/cities/modeling/model_interactions.py +++ b/cities/modeling/model_interactions.py @@ -3,10 +3,10 @@ from typing import Optional import dill +import pyro import pyro.distributions as dist import torch -import pyro from cities.modeling.modeling_utils import ( prep_wide_data_for_inference, train_interactions_model, diff --git a/cities/modeling/modeling_utils.py b/cities/modeling/modeling_utils.py index 966a0ba5..55aaccc6 100644 --- a/cities/modeling/modeling_utils.py +++ b/cities/modeling/modeling_utils.py @@ -2,13 +2,13 @@ import matplotlib.pyplot as plt import pandas as pd +import pyro import torch from pyro.infer import SVI, Trace_ELBO from pyro.infer.autoguide import AutoNormal from pyro.optim import Adam # type: ignore from scipy.stats import spearmanr -import pyro from cities.utils.data_grabber import ( DataGrabber, list_available_features, diff --git a/cities/queries/causal_insight.py b/cities/queries/causal_insight.py index 187855ea..7a7a7e98 100644 --- a/cities/queries/causal_insight.py +++ b/cities/queries/causal_insight.py @@ -5,10 +5,10 @@ import numpy as np import pandas as pd import plotly.graph_objects as go +import pyro import torch from sklearn.preprocessing import StandardScaler -import pyro from cities.modeling.model_interactions import model_cities_interaction from cities.modeling.modeling_utils import prep_wide_data_for_inference from cities.utils.cleaning_utils import ( @@ -576,7 +576,8 @@ def estimate_ATE(self): label=f"mean = {tau_samples.mean():.3f}", ) plt.title( - f"ATE for {self.intervention_dataset} and {self.outcome_dataset} with forward shift = {self.forward_shift}" + f"ATE for {self.intervention_dataset} and {self.outcome_dataset} " + f"with forward shift = {self.forward_shift}" ) plt.ylabel("counts") plt.xlabel("ATE") diff --git a/dbt/.gitignore b/dbt/.gitignore new file mode 100644 index 00000000..23e952a5 --- /dev/null +++ b/dbt/.gitignore @@ -0,0 +1,3 @@ +target/ +dbt_packages/ +logs/ \ No newline at end of file diff --git a/dbt/README.md b/dbt/README.md new file mode 100644 index 00000000..7874ac84 --- /dev/null +++ b/dbt/README.md @@ -0,0 +1,15 @@ +Welcome to your new dbt project! + +### Using the starter project + +Try running the following commands: +- dbt run +- dbt test + + +### Resources: +- Learn more about dbt [in the docs](https://docs.getdbt.com/docs/introduction) +- Check out [Discourse](https://discourse.getdbt.com/) for commonly asked questions and answers +- Join the [chat](https://community.getdbt.com/) on Slack for live discussions and support +- Find [dbt events](https://events.getdbt.com) near you +- Check out [the blog](https://blog.getdbt.com/) for the latest news on dbt's development and best practices diff --git a/dbt/analyses/.gitkeep b/dbt/analyses/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/dbt/dbt_project.yml b/dbt/dbt_project.yml new file mode 100644 index 00000000..34355ccf --- /dev/null +++ b/dbt/dbt_project.yml @@ -0,0 +1,29 @@ + +# Name your project! Project names should contain only lowercase characters +# and underscores. A good package name should reflect your organization's +# name or the intended use of these models +name: 'cities' +version: '1.0.0' + +# This setting configures which "profile" dbt uses for this project. +profile: 'cities' + +# These configurations specify where dbt should look for different types of files. +# The `model-paths` config, for example, states that models in this project can be +# found in the "models/" directory. You probably won't need to change these! +model-paths: ["models"] +analysis-paths: ["analyses"] +test-paths: ["tests"] +seed-paths: ["seeds"] +macro-paths: ["macros"] +snapshot-paths: ["snapshots"] + +clean-targets: # directories to be removed by `dbt clean` + - "target" + - "dbt_packages" + + +vars: + srid: 26915 # use UTM zone 15N for all geometric data. note, this must have meters as the unit of measure + # years for which we have census tract/block group data + census_years: [2010, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023] diff --git a/dbt/macros/.gitkeep b/dbt/macros/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/dbt/macros/median.sql b/dbt/macros/median.sql new file mode 100644 index 00000000..131339f9 --- /dev/null +++ b/dbt/macros/median.sql @@ -0,0 +1,3 @@ +{% macro median(attr) %} +(percentile_cont(0.5) within group (order by {{ attr }})) +{% endmacro %} diff --git a/dbt/macros/safe_divide.sql b/dbt/macros/safe_divide.sql new file mode 100644 index 00000000..7d1d5723 --- /dev/null +++ b/dbt/macros/safe_divide.sql @@ -0,0 +1,3 @@ +{% macro safe_divide(num, dem) %} + (case when {{ dem }} = 0 then 0 else {{ num }} / {{ dem }} end) +{% endmacro %} diff --git a/dbt/macros/standardize.sql b/dbt/macros/standardize.sql new file mode 100644 index 00000000..742e971f --- /dev/null +++ b/dbt/macros/standardize.sql @@ -0,0 +1,13 @@ +{% macro standardize_cont(columns) %} + {% for c in columns %} + {{ c }} as {{ c }}_original, (({{ c }} - (avg({{ c }}) over ())) / (stddev_samp({{ c }}) over ()))::double precision as {{ c }} + {% if not loop.last %},{% endif %} + {% endfor %} +{% endmacro %} + +{% macro standardize_cat(columns) %} + {% for c in columns %} + {{ c }} as {{ c }}_original, (dense_rank() over (order by {{ c }})) - 1 as {{ c }} + {% if not loop.last %},{% endif %} + {% endfor %} +{% endmacro %} diff --git a/dbt/macros/tag_regions.sql b/dbt/macros/tag_regions.sql new file mode 100644 index 00000000..ae76c040 --- /dev/null +++ b/dbt/macros/tag_regions.sql @@ -0,0 +1,69 @@ +-- Tag regions with their containing/most intersecting/closest parent regions. +-- child_table: table with the child regions +-- parent_table: table with the parent regions +-- max_distance: maximum distance to consider a region as a parent (meters) +{% macro tag_regions(child_table, parent_table, max_distance=100) %} +( +-- the not materialized keyword allows us to use indexes on the child and parent +-- tables +with child as not materialized ( + select * from {{child_table}} +) +, parent as not materialized ( + select * from {{parent_table}} +) +, within as ( + select child.id as child_id + , parent.id as parent_id + , child.valid * parent.valid as valid + from + child + inner join parent + on ST_Within (child.geom, parent.geom) + and child.valid && parent.valid +) +, not_within as ( + select * from child + where not exists (select child_id from within where child_id = id) +) +, largest_overlap as ( + select distinct on (child.id) + child.id as child_id + , parent.id as parent_id + , child.valid * parent.valid as valid + from + not_within as child + inner join parent + on ST_Intersects (child.geom, parent.geom) + and child.valid && parent.valid + order by + child_id, + ST_Area (ST_Intersection (child.geom, parent.geom)) desc +) +, no_overlap as ( + select * from not_within + where not exists ( + select child_id from largest_overlap where child_id = id + ) +) +, closest as ( + select distinct on (child.id) + child.id as child_id + , parent.id as parent_id + , child.valid * parent.valid as valid + from + no_overlap as child + inner join parent + on child.valid && parent.valid + and ST_DWithin (child.geom, parent.geom, {{max_distance}}) + order by + child_id, + ST_Distance (child.geom, parent.geom) +) +select *, 'within' as type_ from within +union all +select *, 'most_overlap' as type_ from largest_overlap +union all +select *, 'closest' as type_ from closest +) +{% endmacro %} diff --git a/dbt/models/acs_block_group.sql b/dbt/models/acs_block_group.sql new file mode 100644 index 00000000..ea77a2b4 --- /dev/null +++ b/dbt/models/acs_block_group.sql @@ -0,0 +1,15 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['census_block_group', 'year_', 'name_'], 'unique': true}, + ] + ) +}} + +select + year::smallint as year_, + code as name_, + statefp || countyfp || tractce || blkgrpce as census_block_group, + case when "value" < 0 then null else "value" end as value_ +from {{ source('minneapolis', 'acs_bg_raw') }} diff --git a/dbt/models/acs_tract.sql b/dbt/models/acs_tract.sql new file mode 100644 index 00000000..3a4d1b74 --- /dev/null +++ b/dbt/models/acs_tract.sql @@ -0,0 +1,15 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['census_tract', 'year_', 'name_'], 'unique': true}, + ] + ) +}} + +select + year::smallint as year_, + code as name_, + statefp || countyfp || tractce as census_tract, + case when "value" < 0 then null else "value" end as value_ +from {{ source('minneapolis', 'acs_tract_raw') }} diff --git a/dbt/models/api/api__census_tracts.sql b/dbt/models/api/api__census_tracts.sql new file mode 100644 index 00000000..5208ae44 --- /dev/null +++ b/dbt/models/api/api__census_tracts.sql @@ -0,0 +1,16 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['year_']} + ] + ) +}} + +with census_tracts as (select * from {{ ref('census_tracts_in_city_boundary') }}) +select + census_tract + , year_ + , st_transform(geom, 4269) as geom +from + census_tracts diff --git a/dbt/models/api/api__demographics.sql b/dbt/models/api/api__demographics.sql new file mode 100644 index 00000000..ca9104bd --- /dev/null +++ b/dbt/models/api/api__demographics.sql @@ -0,0 +1,34 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['description']} + ] + ) +}} + +-- This is used by the web app. It has a row for each tract, demographic +-- variable pair and a column for each year. +with +demographics as (select * from {{ ref('demographics') }}), +census_tracts as (select * from {{ ref('census_tracts_in_city_boundary') }}), +demographics_filtered as ( + select demographics.* + from demographics + inner join census_tracts using (census_tract, year_) +), +final_ as ( + select + description, + census_tract as tract_id, + {{ dbt_utils.pivot('year_', + dbt_utils.get_column_values(ref('demographics'), + 'year_', + order_by='year_'), + then_value='value_', + else_value='null', + agg='max') }} + from demographics_filtered + group by 1, 2 +) +select * from final_ diff --git a/dbt/models/api/api__high_frequency_transit_lines.sql b/dbt/models/api/api__high_frequency_transit_lines.sql new file mode 100644 index 00000000..3e445e5b --- /dev/null +++ b/dbt/models/api/api__high_frequency_transit_lines.sql @@ -0,0 +1,17 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['valid']} + ] + ) +}} + +select + high_frequency_transit_line_id, + valid, + st_transform(geom, 4269) as geom, + st_transform(blue_zone_geom, 4269) as blue_zone_geom, + st_transform(yellow_zone_geom, 4269) as yellow_zone_geom +from + {{ ref('high_frequency_transit_lines') }} diff --git a/dbt/models/census_block_groups.sql b/dbt/models/census_block_groups.sql new file mode 100644 index 00000000..b33a6aea --- /dev/null +++ b/dbt/models/census_block_groups.sql @@ -0,0 +1,57 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['census_block_group_id'], 'unique': true}, + {'columns': ['geom'], 'type': 'gist'}, + {'columns': ['valid', 'geom'], 'type': 'gist'} + ] + ) +}} + +with +census_tracts as (select * from {{ ref("census_tracts") }}), +census_block_groups as ( + {% for year_ in var('census_years') %} + select + {% if year_ == 2010 %} + state as statefp + , county countyfp + , tract as tractce + , blkgrp as blkgrpce + , geo_id as geoidfq + , '[,2013-01-01)'::daterange as valid -- use 2010 data for all years before 2013 + {% else %} + statefp + , countyfp + , tractce + , blkgrpce + , {{ 'geoidfq' if year_ >= 2023 else 'affgeoid' }} as geoidfq + , '[{{ year_ }}-01-01,{{ year_ + 1 }}-01-01)'::daterange as valid + {% endif %} + , {{ year_ }} as year_ + , st_transform(geom, {{ var("srid") }}) as geom + from + {{ source('minneapolis', 'census_cb_' ~ year_ ~ '_27_bg_500k') }} + {% if not loop.last %}union all{% endif %} + {% endfor %} +), +census_block_groups_with_tracts as ( + select + census_block_groups.statefp + , census_block_groups.countyfp + , census_block_groups.tractce + , census_block_groups.blkgrpce + , census_block_groups.geoidfq + , census_tracts.census_tract_id + , (census_block_groups.valid * census_tracts.valid) as valid + , census_block_groups.geom + from census_block_groups + inner join census_tracts using (statefp, countyfp, tractce) + where + census_tracts.valid && census_block_groups.valid +) +select + {{ dbt_utils.generate_surrogate_key(['geoidfq', 'valid']) }} as census_block_group_id, + * +from census_block_groups_with_tracts diff --git a/dbt/models/census_tracts.sql b/dbt/models/census_tracts.sql new file mode 100644 index 00000000..50462489 --- /dev/null +++ b/dbt/models/census_tracts.sql @@ -0,0 +1,71 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['census_tract_id'], 'unique': true}, + {'columns': ['valid', 'geom'], 'type': 'gist'}, + {'columns': ['year_']} + ] + ) +}} + +with census_tracts_union as ( + {% for year_ in var('census_years') %} +select + {% if year_ == 2010 %} + state as statefp + , county as countyfp + , tract as tractce + , geo_id as geoidfq + {% else %} + statefp + , countyfp + , tractce + , {{ 'geoidfq' if year_ >= 2023 else 'affgeoid' }} as geoidfq + {% endif %} + , '[{{year_}}-01-01,{{ year_ + 1 }}-01-01)'::daterange as valid + , {{ year_ }} as year_ + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'census_cb_' ~ year_ ~ '_27_tract_500k') }} +{% if not loop.last %}union all{% endif %} +{% endfor %} +), +years_2011_2012 as ( + select + statefp + , countyfp + , tractce + , geoidfq + , '[2011-01-01,2012-01-01)'::daterange as valid + , 2011 as year_ + , geom + from census_tracts_union + where year_ = 2010 + union all + select + statefp + , countyfp + , tractce + , geoidfq + , '[2012-01-01,2013-01-01)'::daterange as valid + , 2012 as year_ + , geom + from census_tracts_union + where year_ = 2010 +), +add_2011_2012 as ( + select * + from census_tracts_union + union all + select * + from years_2011_2012 +), +with_census_tract as ( + select *, statefp || countyfp || tractce as census_tract + from add_2011_2012 +) +select + {{ dbt_utils.generate_surrogate_key(['geoidfq', 'year_']) }} as census_tract_id, * +from + with_census_tract diff --git a/dbt/models/census_tracts_in_city_boundary.sql b/dbt/models/census_tracts_in_city_boundary.sql new file mode 100644 index 00000000..5a2955fc --- /dev/null +++ b/dbt/models/census_tracts_in_city_boundary.sql @@ -0,0 +1,17 @@ +with census_tracts as ( + select * from {{ ref('census_tracts') }} +) +, city_boundary as ( + select * from {{ ref('city_boundary') }} +) +select + census_tracts.census_tract_id + , census_tracts.valid + , census_tracts.census_tract + , census_tracts.year_ + , census_tracts.geom +from + census_tracts + , city_boundary +where st_intersects(census_tracts.geom, city_boundary.geom) + and st_area(st_intersection(census_tracts.geom, city_boundary.geom)) / st_area(census_tracts.geom) > 0.9 diff --git a/dbt/models/city_boundary.sql b/dbt/models/city_boundary.sql new file mode 100644 index 00000000..d9bfa060 --- /dev/null +++ b/dbt/models/city_boundary.sql @@ -0,0 +1,5 @@ +select + ogc_fid as city_boundary_id + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'city_boundary_minneapolis') }} diff --git a/dbt/models/commercial_permits.sql b/dbt/models/commercial_permits.sql new file mode 100644 index 00000000..755de463 --- /dev/null +++ b/dbt/models/commercial_permits.sql @@ -0,0 +1,29 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['commercial_permit_id'], 'unique': true}, + {'columns': ['geom'], 'type': 'gist'} + ] + ) +}} + +with +stg_commercial_permits as (select * from {{ ref('stg_commercial_permits') }}), +stg_commercial_permits_to_parcels as (select * from {{ ref('stg_commercial_permits_to_parcels') }}), +permits_to_first_parcel as ( + select commercial_permit_id, min(parcel_id) as parcel_id + from stg_commercial_permits_to_parcels group by 1 +), + +parcels as (select * from {{ ref('parcels') }}) +select + stg_commercial_permits.*, + permits_to_first_parcel.parcel_id, + parcels.census_block_group_id, + parcels.census_tract_id, + parcels.zcta_id +from + stg_commercial_permits + left join permits_to_first_parcel using (commercial_permit_id) + left join parcels using (parcel_id) diff --git a/dbt/models/demographics.sql b/dbt/models/demographics.sql new file mode 100644 index 00000000..3720dac5 --- /dev/null +++ b/dbt/models/demographics.sql @@ -0,0 +1,45 @@ +-- Demographic data +-- Contains data from the ACS and the computed segregation indexes. +with +acs_tract as (select * from {{ ref('acs_tract') }}), +acs_variables as (select * from {{ ref('acs_variables') }}), +acs_tract_with_description as ( + select + acs_tract.census_tract, + acs_tract.year_, + acs_tract.name_, + acs_variables.description, + acs_tract.value_ + from acs_tract + inner join acs_variables on acs_tract.name_ = acs_variables.variable +), +segregation_indexes as ( + select + census_tract, + year_, + null as name_, + 'segregation_index_' || distribution as description, + segregation_index as value_ + from {{ ref('segregation_indexes') }} +), +demographics as ( + select * from acs_tract_with_description + union all + select * from segregation_indexes +) +-- Fill in data for 2011, 2012 using closest available year. Replace 2020 data +-- with 2019 data to avoid pandemic effects. +, demographics_replace_years as ( + select * from demographics where year_ != 2020 + union all + select census_tract, 2020 as year_, name_, description, value_ + from demographics where year_ = 2019 + union all + select census_tract, 2011 as year_, name_, description, value_ + from demographics where year_ = 2013 + union all + select census_tract, 2012 as year_, name_, description, value_ + from demographics where year_ = 2013 +) +select * +from demographics_replace_years diff --git a/dbt/models/docs.md b/dbt/models/docs.md new file mode 100644 index 00000000..fd74fa38 --- /dev/null +++ b/dbt/models/docs.md @@ -0,0 +1,184 @@ +{% docs commercial_permits %} + +Contains commercial building permit applications. + +Notes: + - Permits are filtered to only include those in Minneapolis. + - `square_feet` is treated as missing if it is 0. + - When mapping permits to parcels, if more than one parcel contains the permit + location, a parcel will be chosen arbitrarily. This can happen because the + same parcel spatial extent can appear multiple times with different PINs, to + represent e.g. units in a condominium. + +{% enddocs %} + +{% docs residential_permits %} + +Contains residential building permit applications. + +Notes: + - Permits are filtered to only include those in Minneapolis. + - `square_feet` is treated as missing if it is 0. + - `permit_value` is treated as missing if it is 0. + - If more than one parcel contains the permit location, a parcel is selected + arbitrarily. See `commercial_permits`. + +{% enddocs %} + +{% docs parking %} + +Notes: + - If more than one parcel contains the permit location, a parcel is selected + arbitrarily. See `commercial_permits`. + +{% enddocs %} + +{% docs zctas %} + +Contains the geometry and metadata for all zip code tabulation areas (ZCTAs) in +the United States. + +These are not the same as zip codes. Zip codes are created by the postal service, and they change regularly. ZCTAs are created by the census bureau alongside the census. Not every zip code has a corresponding ZCTA (unpopulated zip codes are not represented, for example), and some ZCTAs cover multiple zip codes. + +Use the mapping table `zip_codes_to_zctas` to translate from zip codes to ZCTAs. + +{% enddocs %} + +{% docs parcels %} + +Contains the geometry and metadata for all parcels in the city of Minneapolis. + +Notes: +- Parcels data is released yearly. Parcels are considered valid for the year they were released. +- Parcels are filtered to only include those in Minneapolis. +- `emv_total`, `emv_bldg`, `emv_land`, `year_built`, and `sale_value` are treated as missing if they are 0. +- `sale_date` is treated as missing if it is equal to `1899-12-30`. +- `pin` is the county-assigned parcel identification number. The county prefix '053-' is removed. +- Duplicate rows are removed. Note that this is based on the entire row, not just the `pin`. There may still be duplicate `pin, year_` pairs. + +{% enddocs %} + +{% docs census_tracts %} + +Contains geometry and metadata for census tracts. Currently only includes census +tracts for Minnesota. + +{% enddocs %} + +{% docs census_block_groups %} + +Contains geometry and metadata for census block groups. Currently only includes +census block groups for Minnesota. + +{% enddocs %} + +{% docs acs_block_group %} + +Contains American Community Survey (ACS) demographic data at a census block +group granularity. + +The `name_` column contains the name of the demographic variable (e.g. +`B03002_003E`). See `acs_variables` for a mapping of these codes to +human-readable names. + +{% enddocs %} + +{% docs acs_tract %} + +Contains American Community Survey (ACS) demographic data at a census tract +granularity. + +The `name_` column contains the name of the demographic variable (e.g. +`B03002_003E`). See `acs_variables` for a mapping of these codes to +human-readable names. + +{% enddocs %} + +{% docs fair_market_rents %} + +Contains fair market rent data for different numbers of bedrooms by zip code. + +{% enddocs %} + +{% docs high_frequency_transit_lines %} + +Contains the geometry and metadata for high frequency transit lines in the city of Minneapolis. + +Notes: +- `blue_zone_geom` is a 350 foot buffer around both lines and stops. +- `yellow_zone_geom` is a quarter mile buffer around lines and a half mile buffer around stops. + +{% enddocs %} + +{% docs segregation_indexes %} + +Segregation index for each tract for each year, computed for each reference +distribution. + +The segregation index is the KL-divergence between the distribution of +population in a tract and a reference distribution. For example, a tract that +has many more white people than the average for the city will have a high +segregation index for the 'average_city' distribution. + +Available distributions: +- `uniform`: Uniform distribution. +- `annual_city`: Citywide distribution for the current year. +- `average_city`: Citywide distribution averaged over all available years. + +{% enddocs %} + +{% docs usps_migration %} + +Contains USPS migration data sourced from change of address forms. Migrations +are broken down by month and year, zip_code, flow direction, and flow type. Flow +directions are either `from` (out of) the zip code or `to` (in to) the zip code. + +Flow types are one of `business`, `family`, `individual`, `perm` (permanent), +`temp` (temporary), or `total`. + +We associate zip codes to ZCTAs and provide aggregate flows for ZCTAs. Note that +some zip codes do not find a match in our zip to ZCTA mapping table, so there is +some missingness in this data. + +{% enddocs %} + +{% docs demographics %} + +Contains demographic data at census tract granularity. +Combines ACS data and segregation indexes in one table. + +Notes: +- Fills in missing demographic data from 2011 and 2012 with data from 2013. +- Replaces pandemic-affected data from 2020 with data from 2019. + +{% enddocs %} + +{% docs neighborhoods %} + +Neighborhood boundaries in the city of Minneapolis. + +{% enddocs %} + +{% docs wards %} + +Ward boundaries in the city of Minneapolis. + +{% enddocs %} + +{% docs university %} + +Boundary of the University of Minnesota. + +{% enddocs %} + +{% docs downtown %} + +Boundary of the downtown of Minneapolis. + +{% enddocs %} + +{% docs city_boundary %} + +Boundary of the city of Minneapolis. + +{% enddocs %} diff --git a/dbt/models/downtown.sql b/dbt/models/downtown.sql new file mode 100644 index 00000000..dc3e09cd --- /dev/null +++ b/dbt/models/downtown.sql @@ -0,0 +1,5 @@ +select + ogc_fid as downtown_id + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'downtown') }} diff --git a/dbt/models/fair_market_rents.sql b/dbt/models/fair_market_rents.sql new file mode 100644 index 00000000..620c0457 --- /dev/null +++ b/dbt/models/fair_market_rents.sql @@ -0,0 +1,18 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['zcta_id', 'year_', 'num_bedrooms']} + ] + ) +}} + +with +fair_market_rents as (select * from {{ ref('stg_fair_market_rents_add_zcta') }}) +select + zcta_id, + year_::smallint, + num_bedrooms::smallint, + avg(rent) as rent +from fair_market_rents +group by 1,2,3 diff --git a/dbt/models/high_frequency_transit_lines.sql b/dbt/models/high_frequency_transit_lines.sql new file mode 100644 index 00000000..c27885ca --- /dev/null +++ b/dbt/models/high_frequency_transit_lines.sql @@ -0,0 +1,30 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['high_frequency_transit_line_id'], 'unique': true}, + {'columns': ['valid', 'geom'], 'type': 'gist'}, + ] + ) +}} + +with lines as (select * from {{ ref('stg_high_frequency_transit_lines_union') }}) +, stops as (select * from {{ ref('high_frequency_transit_stops') }}) +, lines_and_stops as ( + select + lines.valid * stops.valid as valid + , lines.geom as line_geom + , stops.geom as stop_geom + from + lines + inner join stops on lines.valid && stops.valid +) +select + {{ dbt_utils.generate_surrogate_key(['valid']) }} as high_frequency_transit_line_id + , valid + , line_geom as geom + -- note units are in meters + , st_buffer(line_geom, 106.7) as blue_zone_geom -- 350 feet + , st_union(st_buffer(line_geom, 402.3), st_buffer(stop_geom, 804.7)) as yellow_zone_geom -- quarter mile around lines and half mile around stops +from + lines_and_stops diff --git a/dbt/models/high_frequency_transit_stops.sql b/dbt/models/high_frequency_transit_stops.sql new file mode 100644 index 00000000..38f40aa0 --- /dev/null +++ b/dbt/models/high_frequency_transit_stops.sql @@ -0,0 +1,10 @@ +with stops_2015 as ( + select + st_union(st_transform(geom, {{ var("srid") }})) as geom + from {{ source('minneapolis', 'high_frequency_transit_2015_freq_rail_stops') }} +) +select + 0 as high_frequency_transit_stop_id + , '[,]'::daterange as valid + , geom +from stops_2015 diff --git a/dbt/models/neighborhoods.sql b/dbt/models/neighborhoods.sql new file mode 100644 index 00000000..bd3da714 --- /dev/null +++ b/dbt/models/neighborhoods.sql @@ -0,0 +1,6 @@ +select + bdnum as neighborhood_id + , bdname as name_ + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'neighborhoods_minneapolis') }} diff --git a/dbt/models/parcels.sql b/dbt/models/parcels.sql new file mode 100644 index 00000000..3cc0f915 --- /dev/null +++ b/dbt/models/parcels.sql @@ -0,0 +1,25 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['parcel_id'], 'unique': true}, + {'columns': ['valid', 'geom'], 'type': 'gist'} + ] + ) +}} + +with +parcels as (select * from {{ ref('stg_parcels') }}), +to_zctas as (select * from {{ref('stg_parcels_to_zctas')}}), +to_census_bgs as (select * from {{ref('stg_parcels_to_census_block_groups')}}), +census_bgs as (select * from {{ref('census_block_groups')}}) +select + parcels.* + , to_zctas.zcta_id + , to_census_bgs.census_block_group_id + , census_bgs.census_tract_id +from + parcels + left join to_zctas using (parcel_id) + left join to_census_bgs using (parcel_id) + left join census_bgs using (census_block_group_id) diff --git a/dbt/models/parking.sql b/dbt/models/parking.sql new file mode 100644 index 00000000..717db5a2 --- /dev/null +++ b/dbt/models/parking.sql @@ -0,0 +1,28 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['parking_id'], 'unique': true}, + {'columns': ['geom'], 'type': 'gist'} + ] + ) +}} + +with + stg_parking as (select * from {{ ref('stg_parking') }}), + stg_parking_to_parcels as (select * from {{ ref('stg_parking_to_parcels') }}), + stg_parking_to_first_parcel as ( + select parking_id, min(parcel_id) as parcel_id + from stg_parking_to_parcels group by 1 + ), + parcels as (select * from {{ ref('parcels') }}) +select + stg_parking.*, + stg_parking_to_first_parcel.parcel_id, + parcels.census_block_group_id, + parcels.census_tract_id, + parcels.zcta_id +from + stg_parking + left join stg_parking_to_first_parcel using (parking_id) + left join parcels using (parcel_id) diff --git a/dbt/models/residential_permits.sql b/dbt/models/residential_permits.sql new file mode 100644 index 00000000..6613e374 --- /dev/null +++ b/dbt/models/residential_permits.sql @@ -0,0 +1,28 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['residential_permit_id'], 'unique': true}, + {'columns': ['geom'], 'type': 'gist'} + ] + ) +}} + +with +stg_residential_permits as (select * from {{ ref('stg_residential_permits') }}), +stg_residential_permits_to_parcels as (select * from {{ ref('stg_residential_permits_to_parcels') }}), +permits_to_first_parcel as ( + select residential_permit_id, min(parcel_id) as parcel_id + from stg_residential_permits_to_parcels group by 1 +), +parcels as (select * from {{ ref('parcels') }}) +select + stg_residential_permits.*, + permits_to_first_parcel.parcel_id, + parcels.census_block_group_id, + parcels.census_tract_id, + parcels.zcta_id +from + stg_residential_permits + left join permits_to_first_parcel using (residential_permit_id) + left join parcels using (parcel_id) diff --git a/dbt/models/schema.yml b/dbt/models/schema.yml new file mode 100644 index 00000000..e3948f2d --- /dev/null +++ b/dbt/models/schema.yml @@ -0,0 +1,240 @@ +sources: + - name: minneapolis + database: cities + schema: minneapolis + tables: + - name: acs_bg_raw + - name: acs_tract_raw + - name: residential_permits_residentialpermits + - name: commercial_permits_nonresidentialconstruction + - name: high_frequency_transit_2015_freq_350_ft_buffer + - name: high_frequency_transit_2015_freq_lines + - name: high_frequency_transit_2015_freq_quarter_and_half_mile_buffer + - name: high_frequency_transit_2015_freq_rail_stops + - name: high_frequency_transit_2016_freq_350_ft_buffer + - name: high_frequency_transit_2016_freq_lines + - name: high_frequency_transit_2016_freq_quarter_and_half_mile_buffer + - name: fair_market_rents_2012 + - name: fair_market_rents_2013 + - name: fair_market_rents_2014 + - name: fair_market_rents_2015 + - name: fair_market_rents_2016 + - name: fair_market_rents_2017 + - name: fair_market_rents_2018 + - name: fair_market_rents_2019 + - name: fair_market_rents_2020 + - name: fair_market_rents_2021 + - name: fair_market_rents_2022 + - name: fair_market_rents_2023 + - name: fair_market_rents_2024 + - name: downtown + - name: university + - name: usps_y2018 + - name: usps_y2019 + - name: usps_y2020 + - name: usps_y2021 + - name: usps_y2022 + - name: usps_y2023 + - name: zip_codes_tl_2020_us_zcta510 + - name: zip_codes_tl_2020_us_zcta520 + - name: zip_codes_zcta_xref + - name: census_cb_2010_27_bg_500k + - name: census_cb_2010_27_tract_500k + - name: census_cb_2013_27_bg_500k + - name: census_cb_2013_27_tract_500k + - name: census_cb_2014_27_bg_500k + - name: census_cb_2014_27_tract_500k + - name: census_cb_2015_27_bg_500k + - name: census_cb_2015_27_tract_500k + - name: census_cb_2016_27_bg_500k + - name: census_cb_2016_27_tract_500k + - name: census_cb_2017_27_bg_500k + - name: census_cb_2017_27_tract_500k + - name: census_cb_2018_27_bg_500k + - name: census_cb_2018_27_tract_500k + - name: census_cb_2019_27_bg_500k + - name: census_cb_2019_27_tract_500k + - name: census_cb_2020_27_bg_500k + - name: census_cb_2020_27_tract_500k + - name: census_cb_2021_27_bg_500k + - name: census_cb_2021_27_tract_500k + - name: census_cb_2022_27_bg_500k + - name: census_cb_2022_27_tract_500k + - name: census_cb_2023_27_bg_500k + - name: census_cb_2023_27_tract_500k + - name: city_boundary_minneapolis + - name: neighborhoods_minneapolis + - name: wards_minneapolis + - name: parcels_shp_plan_regonal_2002_parcels2002hennepin + - name: parcels_shp_plan_regonal_2003_parcels2003hennepin + - name: parcels_shp_plan_regonal_2004_parcels2004hennepin + - name: parcels_shp_plan_regonal_2005_parcels2005hennepin + - name: parcels_shp_plan_regonal_2006_parcels2006hennepin + - name: parcels_shp_plan_regonal_2007_parcels2007hennepin + - name: parcels_shp_plan_regonal_2008_parcels2008hennepin + - name: parcels_shp_plan_regonal_2009_parcels2009hennepin + - name: parcels_shp_plan_regonal_2010_parcels2010hennepin + - name: parcels_shp_plan_regonal_2011_parcels2011hennepin + - name: parcels_shp_plan_regonal_2012_parcels2012hennepin + - name: parcels_shp_plan_regonal_2013_parcels2013hennepin + - name: parcels_shp_plan_regonal_2014_parcels2014hennepin + - name: parcels_shp_plan_regonal_2015_parcels2015hennepin + - name: parcels_shp_plan_regonal_2016_parcels2016hennepin + - name: parcels_shp_plan_regonal_2017_parcels2017hennepin + - name: parcels_shp_plan_regonal_2018_parcels2018hennepin + - name: parcels_shp_plan_regonal_2019_parcels2019hennepin + - name: parcels_shp_plan_regonal_2020_parcels2020hennepin + - name: parcels_shp_plan_regonal_2021_parcels2021hennepin + - name: parcels_shp_plan_regonal_2022_parcels2022hennepin + - name: parcels_shp_plan_regonal_2023_parcels2023hennepin + - name: parking_parcels + +models: + - name: census_tracts + description: '{{ doc("census_tracts") }}' + columns: + - name: census_tract_id + data_tests: + - unique + - not_null + + - name: census_block_groups + description: '{{ doc("census_block_groups") }}' + columns: + - name: census_block_group_id + data_tests: + - unique + - not_null + - name: census_tract_id + data_tests: + - relationships: + to: ref('census_tracts') + field: census_tract_id + + - name: acs_block_group + description: '{{ doc("acs_block_group") }}' + + - name: acs_tract + description: '{{ doc("acs_tract") }}' + + - name: fair_market_rents + description: '{{ doc("fair_market_rents") }}' + + - name: high_frequency_transit_lines + description: '{{ doc("high_frequency_transit_lines") }}' + + - name: demographics + description: '{{ doc("demographics") }}' + + - name: university + description: '{{ doc("university") }}' + + - name: downtown + description: '{{ doc("downtown") }}' + + - name: city_boundary + description: '{{ doc("city_boundary") }}' + + - name: parking + description: '{{ doc("parking") }}' + + - name: segregation_indexes + description: '{{ doc("segregation_indexes") }}' + data_tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - census_tract + - year_ + - distribution + columns: + - name: census_tract + data_tests: + - relationships: + to: ref('census_tracts') + field: census_tract + + - name: parcels + description: '{{ doc("parcels") }}' + columns: + - name: parcel_id + data_tests: + - unique + - not_null + - name: zcta_id + data_tests: + - not_null + - relationships: + to: ref('zctas') + field: zcta_id + - name: census_block_group_id + data_tests: + - relationships: + to: ref('census_block_groups') + field: census_block_group_id + + - name: zctas + description: '{{ doc("zctas") }}' + columns: + - name: zcta_id + data_tests: + - not_null + - unique + + - name: usps_migration + description: '{{ doc("usps_migration") }}' + data_tests: + - dbt_utils.unique_combination_of_columns: + combination_of_columns: + - date_ + - zcta_id + - flow_direction + - flow_type + columns: + - name: zcta_id + data_tests: + - relationships: + to: ref('zctas') + field: zcta_id + + - name: commercial_permits + description: '{{ doc("commercial_permits") }}' + columns: + - name: commercial_permit_id + data_tests: + - not_null + - unique + + - name: residential_permits + description: '{{ doc("residential_permits") }}' + columns: + - name: residential_permit_id + data_tests: + - not_null + - unique + + - name: neighborhoods + description: '{{ doc("neighborhoods") }}' + columns: + - name: neighborhood_id + data_tests: + - not_null + - unique + + - name: wards + description: '{{ doc("wards") }}' + columns: + - name: ward_id + data_tests: + - not_null + - unique + +seeds: + - name: population_categories + columns: + - name: category + data_tests: + - unique + - not_null + - relationships: + to: ref('acs_variables') + field: description diff --git a/dbt/models/segregation_indexes.sql b/dbt/models/segregation_indexes.sql new file mode 100644 index 00000000..cdadbc67 --- /dev/null +++ b/dbt/models/segregation_indexes.sql @@ -0,0 +1,108 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['census_tract', 'year_', 'distribution'], 'unique': true}, + ] + ) +}} + +with + categories as (select * from {{ ref("population_categories") }}) + , acs_tract_all as (select * from {{ ref("acs_tract") }}) + , acs_variables as ( + select + variable as name_, + description + from {{ ref("acs_variables") }} + ) + , census_tracts_in_city_boundary as (select * from {{ ref('census_tracts_in_city_boundary') }}) + , acs_tract as ( + select * from acs_tract_all inner join census_tracts_in_city_boundary using (census_tract, year_) + ) + , pop_tyc as + ( -- Population by tract, year, and category + select acs_tract.census_tract, acs_tract.year_, categories.category, acs_tract.value_ + from acs_tract + inner join acs_variables using (name_) + inner join categories on categories.category = acs_variables.description + ), + pop_ty as + ( + select census_tract, year_, sum(value_) as value_ + from pop_tyc + group by 1, 2 + ), + pop_yc as + ( -- Population by year and category + select year_, category, sum(value_) as value_ + from pop_tyc + group by 1, 2 + ), + pop_y as + ( -- Population by year + select year_, sum(value_) as value_ + from pop_tyc + group by 1 + ), + dist_yc as + ( -- Distribution of population by year and category + select + pop_yc.year_, + pop_yc.category, + ({{ safe_divide('pop_yc.value_', 'pop_y.value_') }})::double precision as value_ + from pop_yc inner join pop_y using (year_) + ), + dist_tyc as + ( -- Distribution of population by tract, year, and category + select + pop_tyc.census_tract, + pop_tyc.year_, + pop_tyc.category, + ({{ safe_divide('pop_tyc.value_', 'pop_ty.value_') }})::double precision as value_ + from pop_tyc inner join pop_ty using (year_, census_tract) + ), + uniform_dist as + ( -- Uniform distribution across categories + with n_cat as (select count(*) as n_cat from categories) + select category, (1.0 / n_cat)::double precision as value_ + from categories, n_cat + ), + average_dist as + ( -- Average of the annual citywide distributions + select category, avg(value_)::double precision as value_ + from dist_yc + group by 1 + ) +select + census_tract, + year_, + dist as distribution, + sum(case when p = 0 or q = 0 then 0 else p * ln(p / q) end) as segregation_index +from + ( + select + dist_tyc.census_tract, + dist_tyc.year_, + dist_tyc.value_ as p, + uniform_dist.value_ as q, + 'uniform' as dist + from dist_tyc inner join uniform_dist using (category) + union all + select + dist_tyc.census_tract, + dist_tyc.year_, + dist_tyc.value_ as p, + dist_yc.value_ as q, + 'annual_city' as dist + from dist_tyc inner join dist_yc using (year_, category) + union all + select + dist_tyc.census_tract, + dist_tyc.year_, + dist_tyc.value_ as p, + average_dist.value_ as q, + 'average_city' as dist + from dist_tyc inner join average_dist using (category) + ) +group by 1, 2, 3 diff --git a/dbt/models/staging/schema.yml b/dbt/models/staging/schema.yml new file mode 100644 index 00000000..dccd58b5 --- /dev/null +++ b/dbt/models/staging/schema.yml @@ -0,0 +1,14 @@ +models: + - name: stg_zctas_2010 + columns: + - name: zcta + data_tests: + - not_null + - unique + + - name: stg_zctas_2020 + columns: + - name: zcta + data_tests: + - not_null + - unique diff --git a/dbt/models/staging/stg_commercial_permits.sql b/dbt/models/staging/stg_commercial_permits.sql new file mode 100644 index 00000000..af5aec34 --- /dev/null +++ b/dbt/models/staging/stg_commercial_permits.sql @@ -0,0 +1,18 @@ +select + sde_id as commercial_permit_id + , year::smallint as year_ + , nonres_gro::text as group_ + , nonres_sub::text as subgroup + , nonres_typ::text as type_category + , bldg_name::text as building_name + , bldg_desc::text as building_description + , permit_typ::text as permit_type + , permit_val::int as permit_value + , nullif(sqf, 0)::int as square_feet + , address::text + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'commercial_permits_nonresidentialconstruction') }} + where + co_code = '053' + and lower(ctu_name) = 'minneapolis' diff --git a/dbt/models/staging/stg_commercial_permits_to_parcels.sql b/dbt/models/staging/stg_commercial_permits_to_parcels.sql new file mode 100644 index 00000000..bbc44326 --- /dev/null +++ b/dbt/models/staging/stg_commercial_permits_to_parcels.sql @@ -0,0 +1,21 @@ +with +commercial_permits as ( + select + commercial_permit_id as id + , daterange(to_date(year_::text, 'YYYY'), to_date(year_::text, 'YYYY'), '[]') as valid + , geom + from {{ ref('stg_commercial_permits') }} +) +, parcels as ( + select + parcel_id as id + , valid + , geom + from {{ ref("parcels") }} +) +select + child_id as commercial_permit_id + , parent_id as parcel_id + , valid + , type_ +from {{ tag_regions("commercial_permits", "parcels") }} diff --git a/dbt/models/staging/stg_fair_market_rents_add_zcta.sql b/dbt/models/staging/stg_fair_market_rents_add_zcta.sql new file mode 100644 index 00000000..de2fdcba --- /dev/null +++ b/dbt/models/staging/stg_fair_market_rents_add_zcta.sql @@ -0,0 +1,18 @@ +with +stg_fair_market_rents_unpivot as ( + select * from {{ ref('stg_fair_market_rents_dedup') }} +), +zip_codes_to_zctas as (select * from {{ ref('zip_codes_to_zctas') }}), +zctas as (select * from {{ ref('zctas') }}) +select + stg_fair_market_rents_unpivot.zip_code, + stg_fair_market_rents_unpivot.year_::smallint, + stg_fair_market_rents_unpivot.num_bedrooms::smallint, + stg_fair_market_rents_unpivot.rent::smallint, + zctas.zcta_id +from + stg_fair_market_rents_unpivot + left join zip_codes_to_zctas using (zip_code) + left join zctas + on zip_codes_to_zctas.zcta = zctas.zcta + and (stg_fair_market_rents_unpivot.year_ || '-01-01')::date <@ zctas.valid diff --git a/dbt/models/staging/stg_fair_market_rents_dedup.sql b/dbt/models/staging/stg_fair_market_rents_dedup.sql new file mode 100644 index 00000000..fec86c06 --- /dev/null +++ b/dbt/models/staging/stg_fair_market_rents_dedup.sql @@ -0,0 +1 @@ +select distinct * from {{ ref('stg_fair_market_rents_unpivot') }} diff --git a/dbt/models/staging/stg_fair_market_rents_union.sql b/dbt/models/staging/stg_fair_market_rents_union.sql new file mode 100644 index 00000000..5bf52020 --- /dev/null +++ b/dbt/models/staging/stg_fair_market_rents_union.sql @@ -0,0 +1,15 @@ +{% set years = range(2012, 2025) %} + +{% for year_ in years %} +select + zip_code + , replace(rent_br0, '.00', '') as rent_br0 + , replace(rent_br1, '.00', '') as rent_br1 + , replace(rent_br2, '.00', '') as rent_br2 + , replace(rent_br3, '.00', '') as rent_br3 + , replace(rent_br4, '.00', '') as rent_br4 + , year as year_ +from + {{ source('minneapolis', 'fair_market_rents_' ~ year_) }} +{% if not loop.last %} union all {% endif %} +{% endfor %} diff --git a/dbt/models/staging/stg_fair_market_rents_unpivot.sql b/dbt/models/staging/stg_fair_market_rents_unpivot.sql new file mode 100644 index 00000000..92e64612 --- /dev/null +++ b/dbt/models/staging/stg_fair_market_rents_unpivot.sql @@ -0,0 +1,16 @@ +with +stg_fair_market_rents_dedup as (select * from {{ ref('stg_fair_market_rents_union') }}) +select + stg_fair_market_rents_dedup.zip_code, + stg_fair_market_rents_dedup.year_, + x.num_bedrooms, + x.rent +from + stg_fair_market_rents_dedup + cross join lateral ( + values (0, rent_br0), + (1, rent_br1), + (2, rent_br2), + (3, rent_br3), + (4, rent_br4) + ) as x(num_bedrooms, rent) diff --git a/dbt/models/staging/stg_high_frequency_transit_lines_union.sql b/dbt/models/staging/stg_high_frequency_transit_lines_union.sql new file mode 100644 index 00000000..4de6bbdb --- /dev/null +++ b/dbt/models/staging/stg_high_frequency_transit_lines_union.sql @@ -0,0 +1,24 @@ +with +lines_2015 as ( + select + st_union(st_transform(geom, {{ var("srid") }})) as geom + from + {{ source('minneapolis', 'high_frequency_transit_2015_freq_lines') }} + where st_geometrytype(geom) = 'ST_MultiLineString' +), +lines_2016 as ( + select + st_union(st_transform(geom, {{ var("srid") }})) as geom + from + {{ source('minneapolis', 'high_frequency_transit_2016_freq_lines') }} + where st_geometrytype(geom) = 'ST_MultiLineString' +) +select + '(,2016-01-01)'::daterange as valid, + geom +from lines_2015 +union all +select + '[2016-01-01,)'::daterange as valid, + geom +from lines_2016 diff --git a/dbt/models/staging/stg_parcels.sql b/dbt/models/staging/stg_parcels.sql new file mode 100644 index 00000000..83b9c77a --- /dev/null +++ b/dbt/models/staging/stg_parcels.sql @@ -0,0 +1,55 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['parcel_id'], 'unique': true}, + {'columns': ['valid', 'geom'], 'type': 'gist'} + ] + ) +}} + +{% set years = range(2002, 2024) %} +{% set city = 'MINNEAPOLIS' %} +{% set county_id = '053' %} + +with +-- This is a union of all the parcels from the years 2002 to 2023 +parcels_union as ( + {% for year_ in years %} + select + ogc_fid, + replace(pin, '{{ county_id }}-', '') as pin, + + -- parcels are a year-end snapshot, named after the year they cover + '[{{ year_ }}-01-01,{{ year_ + 1 }}-01-01)'::daterange as valid, + nullif(emv_land, 0)::int as emv_land, + nullif(emv_bldg, 0)::int as emv_bldg, + nullif(emv_total, 0)::int as emv_total, + nullif(year_built, 0)::smallint as year_built, + nullif(sale_date, '1899-12-30'::date) as sale_date, + nullif(sale_value, 0)::int as sale_value, + st_transform(geom, {{ var("srid") }}) as geom + from {{ source('minneapolis', 'parcels_shp_plan_regonal_' ~ year_ ~ '_parcels' ~ year_ ~ 'hennepin') }} + where upper({{ "city" if year_ < 2018 else "ctu_name" }}) = '{{ city }}' + {% if not loop.last %}union all{% endif %} + {% endfor %} +), + +-- Some of the parcel datasets contain exact duplicates that we remove. Note +-- that duplicate pin/year pairs may remain. +parcels_distinct as ( + select distinct on (pin, valid, emv_land, emv_bldg, emv_total, year_built, sale_date, sale_value, geom) * + from parcels_union +) +select + {{ dbt_utils.generate_surrogate_key(['ogc_fid', 'valid']) }} as parcel_id, + pin, + valid, + emv_land, + emv_bldg, + emv_total, + year_built, + sale_date, + sale_value, + geom +from parcels_distinct diff --git a/dbt/models/staging/stg_parcels_to_census_block_groups.sql b/dbt/models/staging/stg_parcels_to_census_block_groups.sql new file mode 100644 index 00000000..d65f230f --- /dev/null +++ b/dbt/models/staging/stg_parcels_to_census_block_groups.sql @@ -0,0 +1,21 @@ +with +parcels as ( + select + parcel_id as id + , valid + , geom + from {{ ref('stg_parcels') }} +), +census_block_groups as ( + select + census_block_group_id as id + , valid + , geom + from {{ ref('census_block_groups') }} +) +select + child_id as parcel_id + , parent_id as census_block_group_id + , valid + , type_ +from {{ tag_regions("parcels", "census_block_groups") }} diff --git a/dbt/models/staging/stg_parcels_to_zctas.sql b/dbt/models/staging/stg_parcels_to_zctas.sql new file mode 100644 index 00000000..680e304e --- /dev/null +++ b/dbt/models/staging/stg_parcels_to_zctas.sql @@ -0,0 +1,21 @@ +with +parcels as ( + select + parcel_id as id + , valid + , geom + from {{ ref("stg_parcels") }} +), +zctas as ( + select + zcta_id as id + , valid + , geom + from {{ ref("zctas") }} +) +select + child_id as parcel_id + , parent_id as zcta_id + , valid + , type_ +from {{ tag_regions("parcels", "zctas") }} diff --git a/dbt/models/staging/stg_parking.sql b/dbt/models/staging/stg_parking.sql new file mode 100644 index 00000000..61667cb0 --- /dev/null +++ b/dbt/models/staging/stg_parking.sql @@ -0,0 +1,15 @@ +with +parking_raw as (select * from {{ source('minneapolis', 'parking_parcels') }}) +select + ogc_fid as parking_id + , to_date("year" || '-' || "date", 'YYYY-DD-Mon') as date_ + , "project na"::text as project_name + , address::text + , neighborho::text as neighborhood + , ward::smallint + , "downtown y" = 'Y' as is_downtown + , "housing un"::smallint as num_housing_units + , "car parkin"::smallint as num_car_parking_spaces + , replace("bike parki", ',', '')::smallint as num_bike_parking_spaces + , st_transform(geom, {{ var("srid") }}) as geom +from parking_raw diff --git a/dbt/models/staging/stg_parking_to_parcels.sql b/dbt/models/staging/stg_parking_to_parcels.sql new file mode 100644 index 00000000..6e708e17 --- /dev/null +++ b/dbt/models/staging/stg_parking_to_parcels.sql @@ -0,0 +1,21 @@ +with + parking as ( + select + parking_id as id + , daterange(date_, date_, '[]') as valid + , geom + from {{ ref('stg_parking') }} + ) + , parcels as ( + select + parcel_id as id + , valid + , geom + from {{ ref('parcels') }} + ) +select + child_id as parking_id + , parent_id as parcel_id + , valid + , type_ +from {{ tag_regions("parking", "parcels") }} diff --git a/dbt/models/staging/stg_residential_permits.sql b/dbt/models/staging/stg_residential_permits.sql new file mode 100644 index 00000000..c6788cc4 --- /dev/null +++ b/dbt/models/staging/stg_residential_permits.sql @@ -0,0 +1,25 @@ +select + sde_id::int as residential_permit_id + , year::smallint as year_ + , tenure::text + , housing_ty::text as housing_type + , res_permit::text as permit_type + , address::text + , name::text as name_ + , buildings::smallint as num_buildings + , units::smallint as num_units + , age_restri::smallint as num_age_restricted_units + , memory_car::smallint as num_memory_care_units + , assisted::smallint as num_assisted_living_units + , com_off_re = 'Y' as is_commercial_and_residential + , nullif(sqf, 0)::int as square_feet + , public_fun = 'Y' as is_public_funded + , nullif(permit_val, 0)::int as permit_value + , community_::text as community_designation + , notes::text + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'residential_permits_residentialpermits') }} +where + co_code = '053' + and lower(ctu_name) = 'minneapolis' diff --git a/dbt/models/staging/stg_residential_permits_to_parcels.sql b/dbt/models/staging/stg_residential_permits_to_parcels.sql new file mode 100644 index 00000000..d3b5ae37 --- /dev/null +++ b/dbt/models/staging/stg_residential_permits_to_parcels.sql @@ -0,0 +1,21 @@ +with +residential_permits as ( + select + residential_permit_id as id + , daterange(to_date(year_::text, 'YYYY'), to_date(year_::text, 'YYYY'), '[]') as valid + , geom + from {{ ref('stg_residential_permits') }} +) +, parcels as ( + select + parcel_id as id + , valid + , geom + from {{ ref("parcels") }} +) +select + child_id as residential_permit_id + , parent_id as parcel_id + , valid + , type_ +from {{ tag_regions("residential_permits", "parcels") }} diff --git a/dbt/models/staging/stg_usps_migration_add_zcta.sql b/dbt/models/staging/stg_usps_migration_add_zcta.sql new file mode 100644 index 00000000..2b45f38e --- /dev/null +++ b/dbt/models/staging/stg_usps_migration_add_zcta.sql @@ -0,0 +1,19 @@ +{{ + config( + materialized='table' + ) +}} + +with +usps_migration as (select * from {{ ref('stg_usps_migration_unpivot') }}), +zctas as (select * from {{ ref('zctas') }}), +zip_codes_to_zctas as (select * from {{ ref('zip_codes_to_zctas') }}) +select + usps_migration.*, + zctas.zcta_id +from + usps_migration + left join zip_codes_to_zctas using (zip_code) + left join zctas + on zip_codes_to_zctas.zcta = zctas.zcta + and usps_migration.date_ <@ zctas.valid diff --git a/dbt/models/staging/stg_usps_migration_union.sql b/dbt/models/staging/stg_usps_migration_union.sql new file mode 100644 index 00000000..4ab16fb4 --- /dev/null +++ b/dbt/models/staging/stg_usps_migration_union.sql @@ -0,0 +1,23 @@ +{% set years = range(2018, 2024) %} + +{% for year_ in years %} + select + to_date("YYYYMM", 'YYYYMM') as date_, + replace("ZIPCODE", '=', '') as zip_code, + "CITY" as city, + "STATE" as state_, + "TOTAL_FROM_ZIP" as total_from_zip, + "TOTAL_BUSINESS" as total_from_zip_business, + "TOTAL_FAMILY" as total_from_zip_family, + "TOTAL_INDIVIDUAL" as total_from_zip_individual, + "TOTAL_PERM" as total_from_zip_perm, + "TOTAL_TEMP" as total_from_zip_temp, + "TOTAL_TO_ZIP" as total_to_zip, + "TOTAL_BUSINESS_dup" as total_to_zip_business, + "TOTAL_FAMILY_dup" as total_to_zip_family, + "TOTAL_INDIVIDUAL_dup" as total_to_zip_individual, + "TOTAL_PERM_dup" as total_to_zip_perm, + "TOTAL_TEMP_dup" as total_to_zip_temp + from {{ source('minneapolis', 'usps_y' ~ year_) }} +{% if not loop.last %} union all {% endif %} +{% endfor %} diff --git a/dbt/models/staging/stg_usps_migration_unpivot.sql b/dbt/models/staging/stg_usps_migration_unpivot.sql new file mode 100644 index 00000000..5f358c4b --- /dev/null +++ b/dbt/models/staging/stg_usps_migration_unpivot.sql @@ -0,0 +1,32 @@ +{{ + config( + materialized='table' + ) +}} + +{% set usps_migration_flow_types = ['business', 'family', 'individual', 'perm', 'temp'] %} +{% set usps_migration_flow_directions = ['from', 'to'] %} + +with +usps_migration as (select * from {{ ref('stg_usps_migration_union') }}) +{% for flow_direction in usps_migration_flow_directions %} + select + date_ + , zip_code + , '{{ flow_direction }}' as flow_direction + , 'total' as flow_type + , total_{{ flow_direction }}_zip::int as flow_value + from usps_migration + union all + {% for flow_type in usps_migration_flow_types %} + select + date_ + , zip_code + , '{{ flow_direction }}' as flow_direction + , '{{ flow_type }}' as flow_type + , total_{{ flow_direction }}_zip_{{ flow_type }}::int as flow_value + from usps_migration + {% if not loop.last %} union all {% endif %} + {% endfor %} +{% if not loop.last %} union all {% endif %} +{% endfor %} diff --git a/dbt/models/staging/stg_zctas_2010.sql b/dbt/models/staging/stg_zctas_2010.sql new file mode 100644 index 00000000..51921be6 --- /dev/null +++ b/dbt/models/staging/stg_zctas_2010.sql @@ -0,0 +1,4 @@ +select + zcta5ce10 as zcta, + st_transform(geom, {{ var("srid") }}) as geom +from {{ source('minneapolis', 'zip_codes_tl_2020_us_zcta510') }} diff --git a/dbt/models/staging/stg_zctas_2020.sql b/dbt/models/staging/stg_zctas_2020.sql new file mode 100644 index 00000000..21c131d1 --- /dev/null +++ b/dbt/models/staging/stg_zctas_2020.sql @@ -0,0 +1,4 @@ +select + zcta5ce20 as zcta, + st_transform(geom, {{ var("srid") }}) as geom +from {{ source('minneapolis', 'zip_codes_tl_2020_us_zcta520') }} diff --git a/dbt/models/tracts_model/docs.md b/dbt/models/tracts_model/docs.md new file mode 100644 index 00000000..a4a3371e --- /dev/null +++ b/dbt/models/tracts_model/docs.md @@ -0,0 +1,92 @@ +{% docs tracts_model_int__census_tracts_filtered %} + +Intermediate table that selects census tracts of interest. Considers only tracts +in the city boundary (tracts must intersect boundary and have at least 90% of +area overlapping) and only for years 2011 to 2020. + +Notes: +- Census tracts for 2020 are replaced with tracts for 2019. This requires + retagging parcels and other spatial entities, because the `census_tract_id` + changes with the replacement. + +{% enddocs %} + +{% docs tracts_model_int__parcels_filtered %} + +Retag parcels to account for tract replacement. This also has the effect of +filtering parcels to the considered tracts. + +{% enddocs %} + +{% docs census_tracts_distance_to_transit %} + +Aggregate `parcels_distance_to_transit` by tract. + +{% enddocs %} + +{% docs census_tracts_housing_units %} + +Aggregate number of units built by tract. Unit data is drawn from +`residential_permits`. + +{% enddocs %} + +{% docs census_tracts_parcel_area %} + +Aggregate parcel area by tract. Area is computed from the parcel geometry, not +from the area included in the parcel dataset. + +{% enddocs %} + +{% docs census_tracts_parking_limits %} + +Parking limits aggregated by tract. + +{% enddocs %} + +{% docs parcels_distance_to_transit %} + +Distance from a parcel to the nearest transit (line or stop). This is the +smallest distance from the parcel geometry to the line geometry, not from the +parcel centroid. + +{% enddocs %} + +{% docs parcels_parking_limits %} + +Parking limits by parcel. The parking limit is a function of the distance from +the parcel to the nearest transit line/transit stop. + +Notes: +- Parcels in all years that intersect (any level of intersection) the downtown + area have the limit eliminated. +- Parcels before 2015 have the full limit. +- Parcels after 2015 and in the blue zone have the limit eliminated. +- Parcels after 2015 and in the yellow zone have the limit reduced. + +{% enddocs %} + +{% docs census_tracts_property_values %} + +Total and median property value aggregated by tract. Uses total estimated market +value from the parcel dataset. + +{% enddocs %} + +{% docs tracts_model__census_tracts %} + +Wide table that joins various census tract level aggregates. + +Notes: +- Continuous columns are standardized by default. Categorical columns are + remapped to [0, |D|), where D is the domain. The original value of a column + `c` is called `c_original`. +- Demographic variables are drawn from ACS tract level data. + +{% enddocs %} + +{% docs tracts_model__parcels %} + +Parcels filtered by the considered census tracts, with additional data. + +{% enddocs %} diff --git a/dbt/models/tracts_model/intermediate/census_tracts_distance_to_transit.sql b/dbt/models/tracts_model/intermediate/census_tracts_distance_to_transit.sql new file mode 100644 index 00000000..a25c6005 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/census_tracts_distance_to_transit.sql @@ -0,0 +1,11 @@ +with +parcels_distance_to_transit as (select * from {{ ref('parcels_distance_to_transit') }}), +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}) +select + census_tracts.census_tract_id, + avg(parcels_distance_to_transit.distance) as mean_distance_to_transit, + {{ median('parcels_distance_to_transit.distance') }} as median_distance_to_transit +from + census_tracts + left join parcels_distance_to_transit using (census_tract_id) +group by 1 diff --git a/dbt/models/tracts_model/intermediate/census_tracts_housing_units.sql b/dbt/models/tracts_model/intermediate/census_tracts_housing_units.sql new file mode 100644 index 00000000..42033743 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/census_tracts_housing_units.sql @@ -0,0 +1,30 @@ +with +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}), +residential_permits as (select * from {{ ref('residential_permits') }}), +residential_permits_to_census_tracts as ( + with + residential_permits_tag as ( + select + residential_permit_id as id + , daterange(to_date(year_::text, 'YYYY'), to_date(year_::text, 'YYYY'), '[]') as valid + , geom + from residential_permits + ), + census_tracts_tag as ( + select census_tract_id as id, valid, geom from census_tracts + ) + select + child_id as residential_permit_id, + parent_id as census_tract_id, + valid, + type_ + from {{ tag_regions("residential_permits_tag", "census_tracts_tag") }} +) +select + census_tracts.census_tract_id, + sum(residential_permits.num_units)::int as num_units +from + census_tracts + left join residential_permits_to_census_tracts using (census_tract_id) + left join residential_permits using (residential_permit_id) +group by 1 diff --git a/dbt/models/tracts_model/intermediate/census_tracts_parcel_area.sql b/dbt/models/tracts_model/intermediate/census_tracts_parcel_area.sql new file mode 100644 index 00000000..1f4216e7 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/census_tracts_parcel_area.sql @@ -0,0 +1,11 @@ +with +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}), +parcels as (select * from {{ ref('tracts_model_int__parcels_filtered') }}) +select + census_tract_id, + sum(st_area(parcels.geom)) as parcel_sqm, + avg(st_area(parcels.geom)) as parcel_mean_sqm, + {{ median('st_area(parcels.geom)') }} as parcel_median_sqm +from + census_tracts left join parcels using (census_tract_id) +group by 1 diff --git a/dbt/models/tracts_model/intermediate/census_tracts_parking_limits.sql b/dbt/models/tracts_model/intermediate/census_tracts_parking_limits.sql new file mode 100644 index 00000000..cf99bf05 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/census_tracts_parking_limits.sql @@ -0,0 +1,8 @@ +with +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}), +parcels_parking_limits as (select * from {{ ref('parcels_parking_limits') }}) +select + census_tract_id, + avg(limit_numeric) as mean_limit +from census_tracts left join parcels_parking_limits using (census_tract_id) +group by census_tract_id diff --git a/dbt/models/tracts_model/intermediate/census_tracts_property_values.sql b/dbt/models/tracts_model/intermediate/census_tracts_property_values.sql new file mode 100644 index 00000000..71f8b74a --- /dev/null +++ b/dbt/models/tracts_model/intermediate/census_tracts_property_values.sql @@ -0,0 +1,11 @@ +-- Median and total parcel property values aggregated by census tract. +with +parcels as (select * from {{ ref('tracts_model_int__parcels_filtered') }}), +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}) +select + census_tracts.census_tract_id, + sum(parcels.emv_total) as total_value, + {{ median('parcels.emv_total') }} as median_value +from + census_tracts left join parcels using (census_tract_id) +group by 1 diff --git a/dbt/models/tracts_model/intermediate/parcels_distance_to_transit.sql b/dbt/models/tracts_model/intermediate/parcels_distance_to_transit.sql new file mode 100644 index 00000000..18cdbf48 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/parcels_distance_to_transit.sql @@ -0,0 +1,20 @@ +-- This model calculates the distance from each parcel to the nearest high +-- frequency transit line or stop +with + parcels as (select * from {{ ref('tracts_model_int__parcels_filtered') }}) + , lines as (select * from {{ ref('high_frequency_transit_lines') }}) + , stops as (select * from {{ ref('high_frequency_transit_stops') }}) + , lines_and_stops as materialized ( + select + lines.valid * stops.valid as valid + , st_union(lines.geom, stops.geom) as geom + from + lines inner join stops on lines.valid && stops.valid +) +select + parcels.parcel_id, + parcels.census_tract_id, + st_distance(parcels.geom, lines_and_stops.geom) as distance +from + parcels + inner join lines_and_stops on parcels.valid && lines_and_stops.valid diff --git a/dbt/models/tracts_model/intermediate/parcels_parking_limits.sql b/dbt/models/tracts_model/intermediate/parcels_parking_limits.sql new file mode 100644 index 00000000..aebd7b00 --- /dev/null +++ b/dbt/models/tracts_model/intermediate/parcels_parking_limits.sql @@ -0,0 +1,46 @@ +with +parcels as (select * from {{ ref('tracts_model_int__parcels_filtered') }}), +transit as (select * from {{ ref('high_frequency_transit_lines') }}), +downtown as (select * from {{ ref('downtown') }}), +with_is_downtown as ( + select + parcels.parcel_id, + parcels.census_tract_id, + parcels.valid, + parcels.geom, + st_intersects(parcels.geom, downtown.geom) as is_downtown + from downtown, parcels +), +with_limit as ( + select + parcels.parcel_id, + parcels.census_tract_id, + parcels.is_downtown, + case + when parcels.is_downtown then 'eliminated' + when parcels.valid << '[2015-01-01,)'::daterange then 'full' + else + case + when st_intersects(parcels.geom, transit.blue_zone_geom) then 'eliminated' + when st_intersects(parcels.geom, transit.yellow_zone_geom) then 'reduced' + else 'full' + end + end as limit_ + from + with_is_downtown as parcels + join transit on parcels.valid && transit.valid +), +with_limit_numeric as ( + select + parcels.parcel_id, + parcels.census_tract_id, + parcels.is_downtown, + parcels.limit_, + case limit_ + when 'full' then 1 + when 'reduced' then 0.5 + when 'eliminated' then 0 + end as limit_numeric + from with_limit as parcels +) +select * from with_limit_numeric diff --git a/dbt/models/tracts_model/intermediate/tracts_model_int__census_tracts_filtered.sql b/dbt/models/tracts_model/intermediate/tracts_model_int__census_tracts_filtered.sql new file mode 100644 index 00000000..eeb99fcd --- /dev/null +++ b/dbt/models/tracts_model/intermediate/tracts_model_int__census_tracts_filtered.sql @@ -0,0 +1,33 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['valid', 'geom'], 'type': 'gist'} + ] + ) +}} + +-- Consider only tracts in the city boundary, replace 2020 tracts with 2019 +-- tracts, and regenerate the surrogate key. +with census_tracts_in_city_boundary as ( + select * + from {{ ref('census_tracts_in_city_boundary') }} + where 2010 < year_ and year_ < 2020 +), +census_tracts_union as ( +select census_tract, year_, valid, geom from census_tracts_in_city_boundary +union all +select + census_tract, + 2020 as year_, + '[2020-01-01,2021-01-01)'::daterange as valid, + geom +from census_tracts_in_city_boundary where year_ = 2019 +) +select + {{ dbt_utils.generate_surrogate_key(['census_tract', 'year_']) }} as census_tract_id, + census_tract, + year_, + valid, + geom +from census_tracts_union diff --git a/dbt/models/tracts_model/intermediate/tracts_model_int__parcels_filtered.sql b/dbt/models/tracts_model/intermediate/tracts_model_int__parcels_filtered.sql new file mode 100644 index 00000000..42b97bef --- /dev/null +++ b/dbt/models/tracts_model/intermediate/tracts_model_int__parcels_filtered.sql @@ -0,0 +1,31 @@ +{{ + config( + materialized='table' + ) +}} + +-- Retag parcels with census tracts (because we replaced the 2020 tracts with the 2019 tracts) +with +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}), +parcels as (select * from {{ ref('parcels') }}), +parcels_tag as (select parcel_id as id, valid, geom from parcels), +census_tracts_tag as (select census_tract_id as id, valid, geom from census_tracts), +parcels_to_census_tracts as ( + select + child_id as parcel_id, + parent_id as census_tract_id + from {{ tag_regions("parcels_tag", "census_tracts_tag") }} +) +select + parcels.parcel_id, + parcels.pin, + parcels.valid, + parcels.emv_land, + parcels.emv_bldg, + parcels.emv_total, + parcels.year_built, + parcels.sale_date, + parcels.sale_value, + parcels.geom, + parcels_to_census_tracts.census_tract_id +from parcels join parcels_to_census_tracts using (parcel_id) diff --git a/dbt/models/tracts_model/schema.yml b/dbt/models/tracts_model/schema.yml new file mode 100644 index 00000000..250d415e --- /dev/null +++ b/dbt/models/tracts_model/schema.yml @@ -0,0 +1,51 @@ +models: + - name: tracts_model_int__census_tracts_filtered + description: '{{ doc("tracts_model_int__census_tracts_filtered") }}' + + - name: tracts_model_int__parcels_filtered + description: '{{ doc("tracts_model_int__parcels_filtered") }}' + + - name: tracts_model__census_tracts + description: '{{ doc("tracts_model__census_tracts") }}' + columns: + - name: segregation + description: Segregation with respect to the annual city distribution. + - name: white + description: The proportion of white people in the tract, not the absolute number. + - name: income + description: Median household income in the tract. + - name: median_distance + description: Median parcel distance to transit in meters. + - name: mean_distance + description: Mean parcel distance to transit in meters. + + - name: tracts_model__parcels + description: '{{ doc("tracts_model__parcels") }}' + columns: + - name: distance_to_transit + description: Minimum distance to transit (lines or stops) in meters. + - name: limit_con + description: Numeric representation of parking limit (1 for full, 0 for eliminated, 0.5 for reduced). + - name: downtown_yn + description: Whether the parcel intersects the downtown area. + + - name: census_tracts_distance_to_transit + description: '{{ doc("census_tracts_distance_to_transit") }}' + + - name: census_tracts_housing_units + description: '{{ doc("census_tracts_housing_units") }}' + + - name: census_tracts_parcel_area + description: '{{ doc("census_tracts_parcel_area") }}' + + - name: census_tracts_parking_limits + description: '{{ doc("census_tracts_parking_limits") }}' + + - name: parcels_distance_to_transit + description: '{{ doc("parcels_distance_to_transit") }}' + + - name: parcels_parking_limits + description: '{{ doc("parcels_parking_limits") }}' + + - name: census_tracts_property_values + description: '{{ doc("census_tracts_property_values") }}' diff --git a/dbt/models/tracts_model/tracts_model__census_tracts.sql b/dbt/models/tracts_model/tracts_model__census_tracts.sql new file mode 100644 index 00000000..0e7e1ea4 --- /dev/null +++ b/dbt/models/tracts_model/tracts_model__census_tracts.sql @@ -0,0 +1,76 @@ +{{ + config( + materialized='table', + ) +}} + +with +housing_units as (select * from {{ ref('census_tracts_housing_units') }}) +, property_values as (select * from {{ ref('census_tracts_property_values') }}) +, distance_to_transit as (select * from {{ ref('census_tracts_distance_to_transit') }}) +, parcel_area as (select * from {{ ref('census_tracts_parcel_area') }}) +, parking_limits as (select * from {{ ref('census_tracts_parking_limits') }}) +, demographics as (select * from {{ ref('demographics') }}) +, census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}) + +-- Demographic data +, white as ( + select * from demographics + where name_ = 'B03002_003E' -- white non-hispanic population +) +, population as ( + select * from demographics + where name_ = 'B01003_001E' -- total population +) +, white_frac as ( + select white.census_tract, white.year_, {{ safe_divide('white.value_', 'population.value_') }} as value_ + from white inner join population using (census_tract, year_) +) +, income as ( + select * from demographics + where name_ = 'B19013_001E' -- median household income +) +, segregation as ( + select * from demographics + where description = 'segregation_index_annual_city' +) + +, raw_data as ( +select + census_tracts.census_tract::bigint + , census_tracts.year_::smallint as "year" + , coalesce(housing_units.num_units, 0) as housing_units + , property_values.total_value + , property_values.median_value + , distance_to_transit.median_distance_to_transit as median_distance + , distance_to_transit.mean_distance_to_transit as mean_distance + , parcel_area.parcel_sqm::double precision + , parcel_area.parcel_mean_sqm::double precision + , parcel_area.parcel_median_sqm::double precision + , parking_limits.mean_limit::double precision + , white_frac.value_ as white + , income.value_ as income + , segregation.value_ as segregation +from + census_tracts + inner join housing_units using (census_tract_id) + inner join property_values using (census_tract_id) + inner join distance_to_transit using (census_tract_id) + inner join parcel_area using (census_tract_id) + inner join parking_limits using (census_tract_id) + left join segregation using (census_tract, year_) + left join white_frac using (census_tract, year_) + left join income using (census_tract, year_) +) +, with_std as ( +select + census_tract + , {{ standardize_cat(['year']) }} + , {{ standardize_cont(['housing_units', 'total_value', 'median_value', + 'median_distance', 'mean_distance', 'parcel_sqm', + 'parcel_mean_sqm', 'parcel_median_sqm', 'white', + 'income', 'mean_limit', 'segregation' ]) }} +from + raw_data +) +select * from with_std diff --git a/dbt/models/tracts_model/tracts_model__parcels.sql b/dbt/models/tracts_model/tracts_model__parcels.sql new file mode 100644 index 00000000..d11f4605 --- /dev/null +++ b/dbt/models/tracts_model/tracts_model__parcels.sql @@ -0,0 +1,23 @@ +{{ + config( + materialized='table', + ) +}} + +with +parcels_parking_limits as (select * from {{ ref('parcels_parking_limits') }}), +parcels_distance_to_transit as (select * from {{ ref('parcels_distance_to_transit') }}), +parcels as (select * from {{ ref('tracts_model_int__parcels_filtered') }}), +census_tracts as (select * from {{ ref('tracts_model_int__census_tracts_filtered') }}) +select + parcels.*, + census_tracts.census_tract, + census_tracts.year_, + parcels_distance_to_transit.distance as distance_to_transit, + parcels_parking_limits.limit_numeric as limit_con, + parcels_parking_limits.is_downtown as downtown_yn +from + parcels + join census_tracts using (census_tract_id) + join parcels_parking_limits using (parcel_id) + join parcels_distance_to_transit using (parcel_id) diff --git a/dbt/models/university.sql b/dbt/models/university.sql new file mode 100644 index 00000000..7c6b4309 --- /dev/null +++ b/dbt/models/university.sql @@ -0,0 +1,5 @@ +select + ogc_fid as university_id + , st_transform(geom, {{ var("srid") }}) as geom +from + {{ source('minneapolis', 'university') }} diff --git a/dbt/models/usps_migration.sql b/dbt/models/usps_migration.sql new file mode 100644 index 00000000..d7b1fc73 --- /dev/null +++ b/dbt/models/usps_migration.sql @@ -0,0 +1,19 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['date_', 'zcta_id', 'flow_direction', 'flow_type'], 'unique': true}, + ] + ) +}} + +with +usps_migration as (select * from {{ ref('stg_usps_migration_add_zcta') }}) +select + date_, + flow_direction, + flow_type, + zcta_id, + sum(flow_value) as flow_value +from usps_migration +group by 1,2,3,4 diff --git a/dbt/models/wards.sql b/dbt/models/wards.sql new file mode 100644 index 00000000..d809d3ad --- /dev/null +++ b/dbt/models/wards.sql @@ -0,0 +1,5 @@ +select + bdnum as ward_id + , geom +from + {{ source('minneapolis', 'wards_minneapolis') }} diff --git a/dbt/models/zctas.sql b/dbt/models/zctas.sql new file mode 100644 index 00000000..62212a9b --- /dev/null +++ b/dbt/models/zctas.sql @@ -0,0 +1,28 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['zcta_id'], 'unique': true}, + {'columns': ['valid', 'geom'], 'type': 'gist'} + ] + ) +}} + +with +zctas as ( +select + zcta, + '[2020-01-01,)'::daterange as valid, + geom +from {{ ref('stg_zctas_2020') }} +union all +select + zcta, + '[,2020-01-01)'::daterange as valid, + geom +from {{ ref('stg_zctas_2010') }} +) +select + {{ dbt_utils.generate_surrogate_key(['zcta', 'valid']) }} as zcta_id, + zctas.* +from zctas diff --git a/dbt/models/zip_codes_to_zctas.sql b/dbt/models/zip_codes_to_zctas.sql new file mode 100644 index 00000000..9ac3a70f --- /dev/null +++ b/dbt/models/zip_codes_to_zctas.sql @@ -0,0 +1,12 @@ +{{ + config( + materialized='table', + indexes = [ + {'columns': ['zip_code']}, + {'columns': ['zcta']} + ] + ) +}} + +select zip_code, zcta +from {{ source('minneapolis', 'zip_codes_zcta_xref') }} diff --git a/dbt/package-lock.yml b/dbt/package-lock.yml new file mode 100644 index 00000000..5231cc02 --- /dev/null +++ b/dbt/package-lock.yml @@ -0,0 +1,6 @@ +packages: + - package: dbt-labs/dbt_utils + version: 1.2.0 + - package: dbt-labs/codegen + version: 0.12.1 +sha1_hash: 37aba29ba147b9afff74716d974b60c54b7f1a1d diff --git a/dbt/packages.yml b/dbt/packages.yml new file mode 100644 index 00000000..27ef0473 --- /dev/null +++ b/dbt/packages.yml @@ -0,0 +1,5 @@ +packages: + - package: dbt-labs/dbt_utils + version: 1.2.0 + - package: dbt-labs/codegen + version: 0.12.1 diff --git a/dbt/seeds/.gitkeep b/dbt/seeds/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/dbt/seeds/acs_variables.csv b/dbt/seeds/acs_variables.csv new file mode 100644 index 00000000..5520ef20 --- /dev/null +++ b/dbt/seeds/acs_variables.csv @@ -0,0 +1,90 @@ +variable,description +B03002_003E,population_white_non_hispanic +B03002_004E,population_black_non_hispanic +B03002_005E,population_asian_non_hispanic +B03002_006E,population_native_hawaiian_or_pacific_islander_non_hispanic +B03002_007E,population_american_indian_or_alaska_native_non_hispanic +B03002_008E,population_other_non_hispanic +B03002_009E,population_multiple_races_non_hispanic +B03002_010E,population_multiple_races_and_other_non_hispanic +B07204_001E,geographic_mobility_total_responses +B07204_002E,geographic_mobility_same_house_1_year_ago +B07204_004E,geographic_mobility_different_house_1_year_ago_same_city +B07204_005E,geographic_mobility_different_house_1_year_ago_same_county +B07204_006E,geographic_mobility_different_house_1_year_ago_same_state +B07204_007E,geographic_mobility_different_house_1_year_ago_same_country +B07204_016E,geographic_mobility_different_house_1_year_ago_abroad +B01003_001E,population +B02001_002E,white +B02001_003E,black +B02001_004E,american_indian_or_alaska_native +B02001_005E,asian +B02001_006E,native_hawaiian_or_pacific_islander +B03001_003E,hispanic_or_latino +B02001_007E,other_race +B02001_008E,multiple_races +B02001_009E,multiple_races_and_other_race +B02001_010E,two_or_more_races_excluding_other +B02015_002E,east_asian_chinese +B02015_003E,east_asian_hmong +B02015_004E,east_asian_japanese +B02015_005E,east_asian_korean +B02015_006E,east_asian_mongolian +B02015_007E,east_asian_okinawan +B02015_008E,east_asian_taiwanese +B02015_009E,east_asian_other +B02015_010E,southeast_asian_burmese +B02015_011E,southeast_asian_cambodian +B02015_012E,southeast_asian_filipino +B02015_013E,southeast_asian_indonesian +B02015_014E,southeast_asian_laotian +B02015_015E,southeast_asian_malaysian +B02015_016E,southeast_asian_mien +B02015_017E,southeast_asian_singaporean +B02015_018E,southeast_asian_thai +B02015_019E,southeast_asian_viet +B02015_020E,southeast_asian_other +B02015_021E,south_asian_asian_indian +B02015_022E,south_asian_bangladeshi +B02015_023E,south_asian_bhutanese +B02015_024E,south_asian_nepalese +B02015_025E,south_asian_pakistani +B02015_026E,south_asian_sikh +B02015_027E,south_asian_sri_lankan +B02015_028E,south_asian_other +B02015_029E,central_asian_kazakh +B02015_030E,central_asian_uzbek +B02015_031E,central_asian_other +B02015_032E,other_asian_specified +B02015_033E,other_asian_not_specified +B19013_001E,median_household_income +B19013A_001E,median_household_income_white +B19013H_001E,median_household_income_white_non_hispanic +B19013I_001E,median_household_income_hispanic +B19013B_001E,median_household_income_black +B19013C_001E,median_household_income_american_indian_or_alaska_native +B19013D_001E,median_household_income_asian +B19013E_001E,median_household_income_native_hawaiian_or_pacific_islander +B19013F_001E,median_household_income_other_race +B19013G_001E,median_household_income_multiple_races +B19019_002E,median_household_income_1_person_households +B19019_003E,median_household_income_2_person_households +B19019_004E,median_household_income_3_person_households +B19019_005E,median_household_income_4_person_households +B19019_006E,median_household_income_5_person_households +B19019_007E,median_household_income_6_person_households +B19019_008E,median_household_income_7_or_more_person_households +B01002_001E,median_age +B01002_002E,median_age_male +B01002_003E,median_age_female +B25031_001E,median_gross_rent +B25031_002E,median_gross_rent_0_bedrooms +B25031_003E,median_gross_rent_1_bedrooms +B25031_004E,median_gross_rent_2_bedrooms +B25031_005E,median_gross_rent_3_bedrooms +B25031_006E,median_gross_rent_4_bedrooms +B25031_007E,median_gross_rent_5_bedrooms +B25032_001E,total_housing_units +B25032_002E,total_owner_occupied_housing_units +B25032_013E,total_renter_occupied_housing_units +B25070_001E,median_gross_rent_as_percentage_of_household_income diff --git a/dbt/seeds/population_categories.csv b/dbt/seeds/population_categories.csv new file mode 100644 index 00000000..501dbf73 --- /dev/null +++ b/dbt/seeds/population_categories.csv @@ -0,0 +1,9 @@ +category +population_white_non_hispanic +population_black_non_hispanic +hispanic_or_latino +population_asian_non_hispanic +population_native_hawaiian_or_pacific_islander_non_hispanic +population_american_indian_or_alaska_native_non_hispanic +population_multiple_races_non_hispanic +population_other_non_hispanic diff --git a/dbt/snapshots/.gitkeep b/dbt/snapshots/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/dbt/tests/.gitkeep b/dbt/tests/.gitkeep new file mode 100644 index 00000000..e69de29b diff --git a/docs/guides/counterfactual-explained.ipynb b/docs/guides/counterfactual-explained.ipynb index 1f2bcd99..7f1f65da 100644 --- a/docs/guides/counterfactual-explained.ipynb +++ b/docs/guides/counterfactual-explained.ipynb @@ -741,7 +741,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": {}, "outputs": [ { @@ -5895,7 +5895,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -5907,7 +5907,7 @@ }, { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "