Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline getting started documentation #636

Closed
wants to merge 10 commits into from
2 changes: 1 addition & 1 deletion docs/source/_toc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,9 @@ subtrees:
entries:
- file: using_the_ve/getting_started
entries:
- file: using_the_ve/virtual_ecosystem_in_use
- file: using_the_ve/ve_run
- file: using_the_ve/example_data
- file: using_the_ve/virtual_ecosystem_in_use
- file: using_the_ve/configuration/config
title: Configuring your model
entries:
Expand Down
36 changes: 27 additions & 9 deletions docs/source/using_the_ve/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ language_info:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.11.9
version: 3.12.8
---

# Getting started
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first mention of pip or a command, should we assume less coding experience of the users and say explicitly where to run it? For example, before pip install below, say something like "...open up the Terminal and type:"

Expand All @@ -41,7 +41,7 @@ that the package is still being developed so these are currently early developme

If you are more interested in playing around with the development of the model, then you
will need to follow the [overview of the code contribution
process]../development/contributing/overivew.md), which covers the installation of the
process](../development/contributing/overview.md), which covers the installation of the
tools required for code development, testing and building documentation.

## Running an example Virtual Ecosystem simulation
Expand All @@ -56,18 +56,36 @@ configuration and data files to run a model.
ve_run --install-example /path/
```

You can then run the model itself:
You can then run the model itself. If you have already run the simulation you will need
to delete or rename the output files, as previously generated output can prevent the
Copy link
Collaborator

@hrlai hrlai Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though one could just remove the files using file explorer, should we give an example code to remove the files (e.g., using rm) for those who have little experience with the Terminal?

If so, probably write it after the ve_run chunk for a better flow...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other file I just realised that @jacobcook1995 had already written something, maybe ditto them here for completeness?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's generally not great practice to use bash scripts to delete directories, because you might end up deleting something by mistake. It's useful in that example because the files are being stored in a kind of weird location (in the user directory) which might be hard to find using Finder. But I personally would encourage users to keep the data files in the virtual_ecosystem directory, because they will likely need to access them while working, and it will all stay a lot more organized that way. That is why I used this method - and in addition, I just think it overcomplicates the task to run a bash script, when the files should be easy to find.

Let me know what you think about that reasoning!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good. I agree that it is unnecessary :) The only thing left to decide if whether to list the filenames to be deleted, but this is not always the same (e.g., the initial states may not be saved depending on user setting.) When I first bumped into this I had to ask around which files to remove.

simulation from running.

```shell
ve_run /path/ve_example/config \
--outpath /path/ve_example/config/out \
--logfile /path/ve_example/out/ve_example.log
```

The [Virtual Ecosystem in use](virtual_ecosystem_in_use.md) page provides a walkthrough
of this process, showing the typical outputs of the model run process, and also provides
some simple plots of model inputs and ouputs.
+++

Once you want to start digging into the structure of the model and inputs, the [example
data](./example_data.md) pages provides a detailed description of the contents of the
`ve_example` directory.
## Simulation results

The Virtual Ecosystem writes out a number of data files:

* `initial_state.nc`: A single compiled file of the initial input data.
* `all_continuous_data.nc`: An optional record of time series data of the variables
updated at each time step.
* `final_state.nc`: The model data state at the end of the final step.

These files are written to the standard NetCDF data file format.
Copy link
Collaborator

@hrlai hrlai Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe insert example code to remove them here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - I think it's better in this tutorial to not have a script for that. LMK if you feel strongly about including it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, replied above :)


## Next steps

* To explore the simulation results further you can visit the [Visualising Virtual
Ecosystem Output](virtual_ecosystem_in_use.md) tutorial, which walks you through basic
graphs using model inputs and outputs.
* The [Example Data](./example_data.md) pages provides a detailed description of the
contents of the `ve_example` directory. Here you can dig into the structure of the
models and inputs.
* When you are ready to set up your own simulation, you can visit [Configuring your
model](configuration/config.md) and [Adding data to the model](data/data.md).
114 changes: 43 additions & 71 deletions docs/source/using_the_ve/virtual_ecosystem_in_use.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,34 +19,25 @@ language_info:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.10.14
version: 3.12.8
---

# Using the Virtual Ecosystem
# Exploring the Virtual Ecosystem outputs

The code below is a brief demonstration of the Virtual Ecosystem model in operation.
The workflow of the model is:
The code below provides a walkthrough of some basic plots for the input and output of
the Virtual Ecosystem simulation. If you have not installed the model, you should do
so first on the [Getting Started](getting_started.md) page.

## Create the model configuration and initial data
## Run the simulation

Here we are using the example data supplied with the `virtual_ecosystem`
package, which supplies a set of example data files and a simple model configuration
to run a simulation. The following command line arguments set up the example data
directory in Linux, Mac or Windows Subsystem for Linux (WSL).
Before exploring the outputs you will need to run the simulation using the example data
or your own input data. If you have already run the simulation and generated outtput
data, you can skip to the
[Initial state and input data](#initial-state-and-input-data) section below.

```{code-cell} ipython3
import pathlib

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import xarray
```

If you have previously attempted to run this example it is probably a good idea to
delete the existing virtual ecosystem example directory, as previously generated files
can prevent the example simulation from running successfully. That can be done as
follows.
The following commands allow you to run the simulation from a Jupyter Notebook.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great to remind the users that they can run the python codes using Jupyter Notebook. However, I struggle to even comprehend where to type the following commands after launching Jupyter Notebook, sorry! Most ecologists or biologists would be more familiar with R than python, so it would be a shame to turn them away because we didn't ease them into python.

That said, I struggle between providing too much details about python coding here (some users may already know how and find it boring) versus hand-carrying new users. Should we dedicate a new page just for using python codes in Jupyter, or at least point the users to an external tutorial page?

However, you can also run the script from the command line following the [Getting
Started](getting_started.md) instructions.

```{code-cell} ipython3
%%bash
Expand All @@ -56,40 +47,12 @@ if [ -d /tmp/ve_example ]; then
fi
```

It should be noted that this is a nuclear option, which is only really appropriate for a
tutorial like this. In general, you can prevent errors due to output files already
existing by either moving or deleting the contents of the `ve_example/out` folder. With
leftover example data directories now removed, a fresh example data directory can then
be installed.

```{code-cell} ipython3
%%bash
# Install the example data directory from the Virtual Ecosystem package
ve_run --install-example /tmp/
```

The `ve_example` directory contains the following files:

* the `config` directory of TOML format configuration files,
* the `data` and `source` directories of netCDF format data files,
* the `generation_scripts` directory containing example recipes for generating files, and
* the `out` directory, which will be used to store model outputs.

```{code-cell} ipython3
# Get a generator of files in the example directory
example_files = (p for p in pathlib.Path("/tmp/ve_example/").rglob("*") if p.is_file())

# Print the relative paths of files
for file in example_files:
print(file.relative_to("/tmp/ve_example"))
```

## Run the Virtual Ecosystem model

Now the example data and configuration have been set up, the `ve_run` command can be
used to execute a Virtual Ecosystem simulation. The `progress` option shows the progress
of the simulation through the various modelling stages.

```{code-cell} ipython3
%%bash
ve_run /tmp/ve_example/config \
Expand All @@ -98,26 +61,9 @@ ve_run /tmp/ve_example/config \
--progress \
```

The log file is very long and shows the process of running the model. The code below
shows the start and end lines from the log to give and idea of what it contains.

```{code-cell} ipython3
# Open and read the log
with open("/tmp/ve_example/out/logfile.log") as log:
log_entries = log.readlines()
## Loading the data

# Print the first lines
for entry in log_entries[:6]:
print(entry.strip())

print("...")

# Print the last lines
for entry in log_entries[-5:]:
print(entry.strip())
```

## Looking at the results
Once the simulation is run you can load the data files into python.

The Virtual Ecosystem writes out a number of data files:

Expand All @@ -128,13 +74,31 @@ The Virtual Ecosystem writes out a number of data files:

These files are written to the standard NetCDF data file format.

```{code-cell} ipython3
# Dependencies for the data and graphing
import pathlib

import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import xarray
```

```{code-cell} ipython3
# Load the generated data files
initial_state = xarray.load_dataset("/tmp/ve_example/out/initial_state.nc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user install the develop branch using pip they probably won't have the initial state file by default settings, see #544 I can confirm that this is still the case today on my computer.

continuous_data = xarray.load_dataset("/tmp/ve_example/out/all_continuous_data.nc")
final_state = xarray.load_dataset("/tmp/ve_example/out/final_state.nc")
```

```{code-cell} ipython3
# Print the name of each variable in the final state
for key in list(final_state.keys()):
print(key)
```

+++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": []}

### Initial state and input data

The `initial_state.nc` file contains all of the data required to run the model. For some
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the plot below, we reshape the vector output of elevation to be a matrix of dimension 9 x 9. Can we avoid hard-coding the row and column numbers from 9 to something from the netCDF output? I imagine someone to change the grid numbers eventually so this will help to automate the plotting a bit more.

Copy link
Collaborator

@davidorme davidorme Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually going to need to be a new function to reverse the internal representation of cells (which is a 1D array) back onto the grid. So that's going to be a general helper function (convert_cell_data_to_grid or similar).

But your point is a good one!

ETA - Actually that get's interesting. You need the grid configuration to go back to the 2D, so plotting spatial data will need a helper class (SpatialPlotter) that is created using the Grid config and then has a method (to_spatial) that maps the cell_id back onto the spatial layout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, something like convert_cell_data_to_grid would be a great addition indeed. Is the grid configuration (e.g., number of rows and columns) stored anywhere in the output files?

Expand Down Expand Up @@ -165,6 +129,7 @@ ax2.set_title("Soil pH (-)")
fig.colorbar(im2, ax=ax2, shrink=0.7)

plt.tight_layout()
plt.plot()
```

For some variables, it may be useful to visualise spatial structure in 3 dimensions.
Expand All @@ -191,6 +156,8 @@ ax.set_title("Elevation (m)")
cell_bounds = range(0, 811, 90)
ax.set_xticks(cell_bounds)
ax.set_yticks(cell_bounds)

plt.show()
```

For other variables, such as air temperature and precipitation, the initial data
Expand All @@ -216,6 +183,8 @@ ax2.plot(initial_state["time_index"], initial_state["precipitation"])
ax2.set_title("Precipitation forcing across grid cells")
ax2.set_ylabel("Total monthly precipitation (mm)")
ax2.set_xlabel("Time step (months)")

plt.plot()
```

### Model outputs
Expand Down Expand Up @@ -249,6 +218,8 @@ for idx, ax in zip([0, 10, 23], axes):

fig.colorbar(im, ax=axes, orientation="vertical", shrink=0.5)
plt.suptitle("Soil carbon: mineral-associated organic matter", y=0.78, x=0.45)

plt.plot()
```

#### Temporal data
Expand All @@ -260,6 +231,8 @@ showing the values in each cell across time.
plt.plot(continuous_data["time_index"], continuous_data["soil_c_pool_maom"])
plt.xlabel("Time step")
plt.ylabel("Soil carbon as MAOM")

plt.plot()
```

#### Vertical structure
Expand Down Expand Up @@ -313,6 +286,5 @@ ax.set_xlabel("Easting (m)")
ax.set_ylabel("Northing (m)")
ax.set_zlabel("Layer height (m)")

ax.set_xticks(cell_bounds)
ax.set_yticks(cell_bounds)
plt.show(cell_bounds, cell_bounds)
```
Loading