Skip to content

Commit

Permalink
Docs first pass
Browse files Browse the repository at this point in the history
  • Loading branch information
Moloq committed Oct 4, 2024
1 parent 711feba commit 41a77b2
Show file tree
Hide file tree
Showing 7 changed files with 62 additions and 6 deletions.
8 changes: 8 additions & 0 deletions docs/inference.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Inference

While we train using individual observations under specific climate and income circumstances, we forecast prevalences and counts for the whole planet. This discrepancy means that, in order to produce predictions, we need to manually build the prediction for every individual pixel in the globe's representation.

Inference takes as input the coefficients from the trained model, climate forecasts for a given scenario, and income forecasts. In order to estimate prevalence for a given pixel, we take the coefficients of the climate variables and apply them to the values for the climate variables for the given scenario. Since income is given in a distribution with 10 deciles, we create an estimate for each decile in the pixel and add them up.

In order to get a region's prevalence, we need a measure of the distribution of the population. Currently, we're using a static 2020 population raster from the Global Human Settlement Layer and calculate the proportion of population living in a given pixel and hold it static throughout the estimation period.
We aggregate the region's pixel's prevalences to produce a single region's prevalence. In order to produce counts and wider prevalences, we use a region-specific population forecast.
26 changes: 26 additions & 0 deletions docs/inputs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# Survey Data Inputs

We use the DHS Surveys as our main source of linking a malnutrition outcome to a given zone (for climate) and household (for income).

There are three extractions that we source from:

- An extraction from the `anthropometrics` codebook, purported to have the same inputs as gbd does (from DHS)
- An extraction from the wealth team
- An extraction from the LSAE team

We link the LSAE and wealth extractions by NID, household id and year, performing cleaning where necessary, as some surveys have the strata as part of the household id, and some don't. The source of the heterogeneity in inputs given that they are coming from the same source hasn't been established.

## Income - Asset matching

In order to forecast using income, we need to transform the DHS Asset Index into a measure of income. We use Joe Dieleman's team's distributions of income.
To do that, for a given NID (that is, for a given location-year_start), we look at the unique households sampled in the survey (if there is more than one observation in the same household, we take them to experience the same income). We weight each household by the reported DHS observation weight to have an asset index distribution. We calculate to what percentile each household's asset index corresponds and we match it to the corresponding percentiles in the income distribution.
We receive the income distributions in increments of 0.1 density, however. In order to match the percentile, we interpolate using monotonic cubic splines.

However, the income distributions and asset distributions often don't have the same shape. We assume this is because the DHS surveys are unlikely to truly sample the very upper ends of the income distribution. Hence, we need fit the DHS asset distribution to a more appropriate income distribution. We do that by testing different thresholds for cutting off the upper end of the income distribution, comparing the two distributions (in a scaled CDF space) and choosing the one that minimizes the absolute sum in the difference between the distributions.

## Climate variables

We use the reported geographical coordinates in the DHS survey –even though they refer to the location of the PSU– to map the correct climate conditions for the household for all the available climate variables such as mean temperature, days above 30C, and precipitation.

We also use those coordinates to triangulate the altitude / elevation of the household.

25 changes: 25 additions & 0 deletions docs/model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@

# Model Specification

We use a mixed methods logistic regression. The objective is to estimate the effect of climate variables, income and other covariates on a given health outcome.

The tool allows for the following:
- Choosing to use age groups and sex as either categorical variables or to do submodels by either or both of them.
- Using any of the climate variables available
- Random effects on the intercept by location - either country-level or FHS admin-2 levels
- Income
- Other variables: SDI, year
- Interaction between a threshold climate variable and income
Other rasterized variables can be added with a few code changes. Non-rasterized variables need to be rasterized first if they vary geographically (such as country-specific variables).

Model covariates can be expressed to undergo transformations from the raw state. These can be:
- Scaling
- Min Max
- Standardizing
- Inner 95: Scale but setting the top and bottom values at the 0.25 and 0.975 percentile of the covariate values' distribution
- Binning
- Masking

In order to specify all these, refer to the example model specifications.

Model training follows the specification provided, transforms the data and feeds it to a Lmer model.
1 change: 0 additions & 1 deletion docs/new_page.md

This file was deleted.

Empty file removed docs/stuff/thing_a.md
Empty file.
Empty file removed docs/stuff/thing_b.md
Empty file.
8 changes: 3 additions & 5 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,11 @@ theme:

nav:
- Introduction: 'index.md'
- Input Data: 'data.md'
- Training Methodology: 'model.md'
- Inference and Forecasting: 'inference.md'
- api_docs.md
- changelog.md
- It's fun: 'new_page.md'
- thing_a: 'stuff/thing_a.md'
- stuff:
- thing_b: 'stuff/thing_b.md'
- thing_c: 'index.md'

watch:
- src/rra_climate_health
Expand Down

0 comments on commit 41a77b2

Please sign in to comment.