diff --git a/docs/inference.md b/docs/inference.md new file mode 100644 index 0000000..c42bb28 --- /dev/null +++ b/docs/inference.md @@ -0,0 +1,8 @@ +# Inference + +While we train using individual observations under specific climate and income circumstances, we forecast prevalences and counts for the whole planet. This discrepancy means that, in order to produce predictions, we need to manually build the prediction for every individual pixel in the globe's representation. + +Inference takes as input the coefficients from the trained model, climate forecasts for a given scenario, and income forecasts. In order to estimate prevalence for a given pixel, we take the coefficients of the climate variables and apply them to the values for the climate variables for the given scenario. Since income is given in a distribution with 10 deciles, we create an estimate for each decile in the pixel and add them up. + +In order to get a region's prevalence, we need a measure of the distribution of the population. Currently, we're using a static 2020 population raster from the Global Human Settlement Layer and calculate the proportion of population living in a given pixel and hold it static throughout the estimation period. +We aggregate the region's pixel's prevalences to produce a single region's prevalence. In order to produce counts and wider prevalences, we use a region-specific population forecast. diff --git a/docs/inputs.md b/docs/inputs.md new file mode 100644 index 0000000..0b99e4e --- /dev/null +++ b/docs/inputs.md @@ -0,0 +1,26 @@ +# Survey Data Inputs + +We use the DHS Surveys as our main source of linking a malnutrition outcome to a given zone (for climate) and household (for income). + +There are three extractions that we source from: + +- An extraction from the `anthropometrics` codebook, purported to have the same inputs as gbd does (from DHS) +- An extraction from the wealth team +- An extraction from the LSAE team + +We link the LSAE and wealth extractions by NID, household id and year, performing cleaning where necessary, as some surveys have the strata as part of the household id, and some don't. The source of the heterogeneity in inputs given that they are coming from the same source hasn't been established. + +## Income - Asset matching + +In order to forecast using income, we need to transform the DHS Asset Index into a measure of income. We use Joe Dieleman's team's distributions of income. +To do that, for a given NID (that is, for a given location-year_start), we look at the unique households sampled in the survey (if there is more than one observation in the same household, we take them to experience the same income). We weight each household by the reported DHS observation weight to have an asset index distribution. We calculate to what percentile each household's asset index corresponds and we match it to the corresponding percentiles in the income distribution. +We receive the income distributions in increments of 0.1 density, however. In order to match the percentile, we interpolate using monotonic cubic splines. + +However, the income distributions and asset distributions often don't have the same shape. We assume this is because the DHS surveys are unlikely to truly sample the very upper ends of the income distribution. Hence, we need fit the DHS asset distribution to a more appropriate income distribution. We do that by testing different thresholds for cutting off the upper end of the income distribution, comparing the two distributions (in a scaled CDF space) and choosing the one that minimizes the absolute sum in the difference between the distributions. + +## Climate variables + +We use the reported geographical coordinates in the DHS survey –even though they refer to the location of the PSU– to map the correct climate conditions for the household for all the available climate variables such as mean temperature, days above 30C, and precipitation. + +We also use those coordinates to triangulate the altitude / elevation of the household. + diff --git a/docs/model.md b/docs/model.md new file mode 100644 index 0000000..0ae8d01 --- /dev/null +++ b/docs/model.md @@ -0,0 +1,25 @@ + +# Model Specification + +We use a mixed methods logistic regression. The objective is to estimate the effect of climate variables, income and other covariates on a given health outcome. + +The tool allows for the following: +- Choosing to use age groups and sex as either categorical variables or to do submodels by either or both of them. +- Using any of the climate variables available +- Random effects on the intercept by location - either country-level or FHS admin-2 levels +- Income +- Other variables: SDI, year +- Interaction between a threshold climate variable and income +Other rasterized variables can be added with a few code changes. Non-rasterized variables need to be rasterized first if they vary geographically (such as country-specific variables). + +Model covariates can be expressed to undergo transformations from the raw state. These can be: +- Scaling + - Min Max + - Standardizing + - Inner 95: Scale but setting the top and bottom values at the 0.25 and 0.975 percentile of the covariate values' distribution +- Binning +- Masking + +In order to specify all these, refer to the example model specifications. + +Model training follows the specification provided, transforms the data and feeds it to a Lmer model. \ No newline at end of file diff --git a/docs/new_page.md b/docs/new_page.md deleted file mode 100644 index ce01362..0000000 --- a/docs/new_page.md +++ /dev/null @@ -1 +0,0 @@ -hello diff --git a/docs/stuff/thing_a.md b/docs/stuff/thing_a.md deleted file mode 100644 index e69de29..0000000 diff --git a/docs/stuff/thing_b.md b/docs/stuff/thing_b.md deleted file mode 100644 index e69de29..0000000 diff --git a/mkdocs.yml b/mkdocs.yml index 0171fd8..efccd48 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -15,13 +15,11 @@ theme: nav: - Introduction: 'index.md' + - Input Data: 'data.md' + - Training Methodology: 'model.md' + - Inference and Forecasting: 'inference.md' - api_docs.md - changelog.md - - It's fun: 'new_page.md' - - thing_a: 'stuff/thing_a.md' - - stuff: - - thing_b: 'stuff/thing_b.md' - - thing_c: 'index.md' watch: - src/rra_climate_health