Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add where helper function to enable reproduction of fancier group constraints #698

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@

### User-facing changes

|new| `where(array, where_array)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679).
|changed| Helper functions are now documented on their own page within the "Defining your own math" section of the documentation (#698).

|new| `where(array, condition)` math helper function to apply a where array _inside_ an expression, to enable extending component dimensions on-the-fly, and applying filtering to different components within the expression (#604, #679).

|new| Data tables can inherit options from `templates`, like `techs` and `nodes` (#676).

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/api/helper_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ search:
---

::: calliope.backend.helper_functions
options:
docstring_options:
ignore_init_summary: true
121 changes: 121 additions & 0 deletions docs/user_defined_math/helper_functions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@

# Helper functions

For [`where` strings](syntax.md#where-strings) and [`expression` strings](syntax.md#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken within the string.
Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md).
Here, we give a brief summary.
Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them.
Helper functions generally require a good understanding of their functionality, so make sure you are comfortable with them beforehand.


## inheritance

using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.
Comment on lines +11 to +12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.
Using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.


So, for the definition:

```yaml
templates:
techgroup1:
template: techgroup2
flow_cap_max: 10
techgroup2:
base_tech: supply
techs:
tech1:
template: techgroup1
tech2:
template: techgroup2
```

`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset.

## any

Parameters are indexed over multiple dimensions.
Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]).
So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over).

## defined

Similar to [any](syntax.md#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions.
In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes.

So, for the definition:

```yaml
techs:
tech1:
base_tech: conversion
carrier_in: electricity
carrier_out: heat
tech2:
base_tech: conversion
carrier_in: [coal, biofuel]
carrier_out: electricity
nodes:
node1:
techs: {tech1}
node2:
techs: {tech1, tech2}
```

`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity.

`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`.

`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`.

## sum

Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression).

Comment on lines +70 to +71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression).
Using `sum(..., over=)` in an expression allows you to sum over one or more dimensions of your component array (be it a parameter, decision variable, or global expression).

## select_from_lookup_arrays

Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays.
These arrays are used to map the array's index items to other index items.
For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster.
Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array.

## get_val_at_index

If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`.
For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep.
This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries.

It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](syntax.md#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`).

## roll

We do not use for-loops in our math.
This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places.
Using `roll(..., dimension_name=N)` allows you to do this.
For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array.
Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop.

## default_if_empty

We work with quite sparse arrays in our models.
So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN.
This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface.
Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value.
Usually you shouldn't need to use this, as your `where` string will mask those NaN values.
But if you're having trouble setting up your math, it is a useful function to getting it over the line.

!!! note
Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math.
You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters.

## where

[Where strings](syntax.md#where-strings) only allow you to apply conditions across the whole expression equations.
Sometimes, it's necessary to apply specific conditions to different components _within_ the expression.
Using `where(<math_component>, <condition>)` helper function enables this,
where `<math_component>` is a reference to a parameter, variable, or global expression and `<condition>` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values.
`<condition>` will then be applied to `<math_component>`, keeping only the values in `<math_component>` where `<condition>` is `True`/`1`.

This helper function can also be used to _extend_ the dimensions of a `<math_component>`.
If the `<condition>` has any dimensions not present in `<math_component>`, `<math_component>` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions.

!!! note
`Where` gets referred to a lot in Calliope math.
It always means the same thing: applying [xarray.DataArray.where][].
129 changes: 4 additions & 125 deletions docs/user_defined_math/syntax.md
irm-codebase marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ When checking the existence of an input parameter it is possible to first sum it
- If you want to apply a constraint across all `nodes` and `techs`, but only for node+tech combinations where the `flow_out_eff` parameter has been defined, you would include `flow_out_eff`.
- If you want to apply a constraint over `techs` and `timesteps`, but only for combinations where the `source_use_max` parameter has at least one `node` with a value defined, you would include `any(resource, over=nodes)`. (1)

1. `any` is a [helper function](#helper-functions); read more below!
1. `any` is a [helper function](helper_functions.md#any); read more below!

Comment on lines +40 to 41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. `any` is a [helper function](helper_functions.md#any); read more below!
1. `any` is a [helper function](helper_functions.md#any)!

1. Checking the value of a configuration option or an input parameter.
Checks can use any of the operators: `>`, `<`, `=`, `<=`, `>=`.
Expand All @@ -50,15 +50,15 @@ Configuration options are any that are defined in `config.build`, where you can
- If you want to apply a constraint only for the first timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=0)`. (1)
- If you want to apply a constraint only for the last timestep in your timeseries, you would include `timesteps=get_val_at_index(dim=timesteps, idx=-1)`.

1. `get_val_at_index` is a [helper function](#helper-functions); read more below!
1. `get_val_at_index` is a [helper function](helper_functions.md#get_val_at_index); read more below!

Comment on lines +53 to 54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. `get_val_at_index` is a [helper function](helper_functions.md#get_val_at_index); read more below!
1. `get_val_at_index` is a [helper function](helper_functions.md#get_val_at_index)!

1. Checking the `base_tech` of a technology (`storage`, `supply`, etc.) or its inheritance chain (if using `templates` and the `template` parameter).

??? example "Examples"

- If you want to create a decision variable across only `storage` technologies, you would include `base_tech=storage`.
- If you want to apply a constraint across only your own `rooftop_supply` technologies (e.g., you have defined `rooftop_supply` in `templates` and your technologies `pv` and `solar_thermal` define `#!yaml template: rooftop_supply`), you would include `inheritance(rooftop_supply)`.
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](#helper-functions)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](helper_functions.md)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a helper function ([see below](helper_functions.md)) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.
Note that `base_tech=...` is a simple check for the given value of `base_tech`, while `inheritance()` is a [helper function](helper_functions.md) which can deal with finding techs/nodes using the same template, e.g. `pv` might inherit the `rooftop_supply` template which in turn might inherit the template `electricity_supply`.


1. Subsetting a set.
The sets available to subset are always [`nodes`, `techs`, `carriers`] + any additional sets defined by you in [`foreach`](#foreach-lists).
Expand All @@ -67,7 +67,7 @@ The sets available to subset are always [`nodes`, `techs`, `carriers`] + any add

- If you want to filter `nodes` where any of a set of `techs` are defined: `defined(techs=[tech1, tech2], within=nodes, how=any)` (1).

1. `defined` is a [helper function](#helper-functions); read more below!
1. `defined` is a [helper function](helper_functions.md#defined); read more below!

Comment on lines +70 to 71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. `defined` is a [helper function](helper_functions.md#defined); read more below!
1. `defined` is a [helper function](helper_functions.md#defined)!

To combine statements you can use the operators `and`/`or`.
You can also use the `not` operator to negate any of the statements.
Expand Down Expand Up @@ -109,127 +109,6 @@ Behind the scenes, we will make sure that every relevant element of the defined
Slicing math components involves appending the component with square brackets that contain the slices, e.g. `flow_out[carriers=electricity, nodes=[A, B]]` will slice the `flow_out` decision variable to focus on `electricity` in its `carriers` dimension and only has two nodes (`A` and `B`) on its `nodes` dimension.
To find out what dimensions you can slice a component on, see your input data (`model.inputs`) for parameters and the definition for decision variables in your math dictionary.

## Helper functions

For [`where` strings](#where-strings) and [`expression` strings](#where-strings), there are many helper functions available to use, to allow for more complex operations to be undertaken.
Their functionality is detailed in the [helper function API page](../reference/api/helper_functions.md).
Here, we give a brief summary.
Some of these helper functions require a good understanding of their functionality to apply, so make sure you are comfortable with them before using them.

### inheritance

using `inheritance(...)` in a `where` string allows you to grab a subset of technologies / nodes that all share the same [`template`](../creating/templates.md) in the technology's / node's `template` key.
If a `template` also inherits from another `template` (chained inheritance), you will get all `techs`/`nodes` that are children along that inheritance chain.

So, for the definition:

```yaml
templates:
techgroup1:
template: techgroup2
flow_cap_max: 10
techgroup2:
base_tech: supply
techs:
tech1:
template: techgroup1
tech2:
template: techgroup2
```

`inheritance(techgroup1)` will give the `[tech1]` subset and `inheritance(techgroup2)` will give the `[tech1, tech2]` subset.

### any

Parameters are indexed over multiple dimensions.
Using `any(..., over=...)` in a `where` string allows you to check if there is at least one non-NaN value in a given dimension (akin to [xarray.DataArray.any][]).
So, `any(cost, over=[nodes, techs])` will check if there is at least one non-NaN tech+node value in the `costs` dimension (the other dimension that the `cost` decision variable is indexed over).

### defined

Similar to [any](#any), using `defined(..., within=...)` in a `where` string allows you to check for non-NaN values along dimensions.
In the case of `defined`, you can check if e.g., certain technologies have been defined within the nodes or certain carriers are defined within a group of techs or nodes.

So, for the definition:

```yaml
techs:
tech1:
base_tech: conversion
carrier_in: electricity
carrier_out: heat
tech2:
base_tech: conversion
carrier_in: [coal, biofuel]
carrier_out: electricity
nodes:
node1:
techs: {tech1}
node2:
techs: {tech1, tech2}
```

`defined(carriers=electricity, within=techs)` would yield a list of `[True, True]` as both technologies define electricity.

`defined(techs=[tech1, tech2], within=nodes)` would yield a list of `[True, True]` as both nodes define _at least one_ of `tech1` or `tech2`.

`defined(techs=[tech1, tech2], within=nodes, how=all)` would yield a list of `[False, True]` as only `node2` defines _both_ `tech1` and `tech2`.

### sum

Using `sum(..., over=)` in an expression allows you to sum over one or more dimension of your component array (be it a parameter, decision variable, or global expression).

### select_from_lookup_arrays

Some of our arrays in [`model.inputs`][calliope.Model.inputs] are not data arrays, but "lookup" arrays.
These arrays are used to map the array's index items to other index items.
For instance when using [time clustering](../advanced/time.md#time-clustering), the `lookup_cluster_last_timestep` array is used to get the timestep resolution and the stored energy for the last timestep in each cluster.
Using `select_from_lookup_arrays(..., dim_name=lookup_array)` allows you to apply this lookup array to your data array.

### get_val_at_index

If you want to access an integer index in your dimension, use `get_val_at_index(dim_name=integer_index)`.
For example, `get_val_at_index(timesteps=0)` will get the first timestep in your timeseries, `get_val_at_index(timesteps=-1)` will get the final timestep.
This is mostly used when conditionally applying a different expression in the first / final timestep of the timeseries.

It can be used in the `where` string (e.g., `timesteps=get_val_at_index(timesteps=0)` to mask all other timesteps) and the `expression string` (via [slices](#slices) - `storage[timesteps=$first_timestep]` and `first_timestep` expression being `get_val_at_index(timesteps=0)`).

### roll

We do not use for-loops in our math.
This can be difficult to get your head around initially, but it means that to define expressions of the form `var[t] == var[t-1] + param[t]` requires shifting all the data in your component array by N places.
Using `roll(..., dimension_name=N)` allows you to do this.
For example, `roll(storage, timesteps=1)` will shift all the storage decision variable objects by one timestep in the array.
Then, `storage == roll(storage, timesteps=1) + 1` is equivalent to applying `storage[t] == storage[t - 1] + 1` in a for-loop.

### default_if_empty

We work with quite sparse arrays in our models.
So, although your arrays are indexed over e.g., `nodes`, `techs` and `carriers`, a decision variable or parameter might only have one or two values in the array, with the rest being NaN.
This can play havoc with defining math, with `nan` values making their way into your optimisation problem and then killing the solver or the solver interface.
Using `default_if_empty(..., default=...)` in your `expression` string allows you to put a placeholder value in, which will be used if the math expression unavoidably _needs_ a value.
Usually you shouldn't need to use this, as your `where` string will mask those NaN values.
But if you're having trouble setting up your math, it is a useful function to getting it over the line.

!!! note
Our internally defined parameters, listed in the `Parameters` section of our [pre-defined base math documentation][base-math] all have default values which propagate to the math.
You only need to use `default_if_empty` for decision variables and global expressions, and for user-defined parameters.

### where

[Where strings](#where-strings) only allow you to apply conditions across the whole expression equations.
Sometimes, it's necessary to apply specific conditions to different components _within_ the expression.
Using `where(<math_component>, <boolean_array>)` helper function enables this,
where `<math_component>` is a reference to a parameter, variable, or global expression and `<boolean_array>` is a reference to an array in your model inputs that contains only `True`/`1` and `False`/`0`/`NaN` values.
`<boolean_array>` will then be applied to `<math_component>`, keeping only the values in `<math_component>` where `<boolean_array>` is `True`/`1`.

This helper function can also be used to _extend_ the dimensions of a `<math_component>`.
If the ``<boolean_array>`` has any dimensions not present in `<math_component>`, `<math_component>` will be [broadcast](https://tutorial.xarray.dev/fundamentals/02.3_aligning_data_objects.html#broadcasting-adjusting-arrays-to-the-same-shape) to include those dimensions.

!!! note
`Where` gets referred to a lot in Calliope math.
It always means the same thing: applying [xarray.DataArray.where][].

## equations

Equations are combinations of [expression strings](#expression-strings) and [where strings](#where-strings).
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ nav:
- user_defined_math/index.md
- user_defined_math/components.md
- user_defined_math/syntax.md
- user_defined_math/helper_functions.md
- user_defined_math/customise.md
- Example additional math gallery:
- user_defined_math/examples/index.md
Expand Down
Loading