Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline getting started documentation #636

Closed
wants to merge 10 commits into from

Conversation

sallymatson
Copy link
Collaborator

@sallymatson sallymatson commented Dec 10, 2024

Description

Updating the getting started documentation to remove redundancies between the Getting Started page and Using the VE page. For now, I have separated them out so that the Getting started is focused on installation/running the simulation for the first time, and the "Using the VE page" (which I renamed to "Exploring the VE outputs") is more to go through a few ways to interact with the outputs / visualize.

Fixes #631

Key checklist

  • Make sure you've run the pre-commit checks: $ pre-commit run -a
  • All tests pass: $ poetry run pytest

Further checks

  • Code is commented, particularly in hard-to-understand areas
  • Tests added that prove fix is effective or that feature works
  • Relevant documentation reviewed and updated

@sallymatson sallymatson requested a review from hrlai December 10, 2024 17:22
@sallymatson sallymatson linked an issue Dec 10, 2024 that may be closed by this pull request
@sallymatson
Copy link
Collaborator Author

Hi @hrlai , I've had a first go at it. Would love to hear your thoughts - if it would be easier to talk on Teams would be happy to set up a call.

@davidorme
Copy link
Collaborator

@sallymatson The precommit.ci run has picked up a problem. It's really trivial - markdownlint wants to remove a trailing space - but it suggests that your development environment has an issue with precommit - either not installed or not configured. If you run poetry run pre-commit run -a, what happens?

@codecov-commenter
Copy link

codecov-commenter commented Dec 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.69%. Comparing base (48ec0ff) to head (ace3c5c).

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #636   +/-   ##
========================================
  Coverage    94.69%   94.69%           
========================================
  Files           73       73           
  Lines         4582     4582           
========================================
  Hits          4339     4339           
  Misses         243      243           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sallymatson
Copy link
Collaborator Author

@sallymatson The precommit.ci run has picked up a problem. It's really trivial - markdownlint wants to remove a trailing space - but it suggests that your development environment has an issue with precommit - either not installed or not configured. If you run poetry run pre-commit run -a, what happens?

Yes, that would be my fault for not runningpre-commit. All set now :)

@davidorme
Copy link
Collaborator

It should be running automatically when you git commit? Does it?

@sallymatson
Copy link
Collaborator Author

It should be running automatically when you git commit? Does it?

It does. The instructions on the doc were perfect ! Do you have any thoughts about the one build that failed?

@davidorme
Copy link
Collaborator

Do you have any thoughts about the one build that failed?

Yup - our CI workflow runs the pre-commit qa job and then does two things: runs the test suite on a bunch of OS and python version combos, but also tries to build the sphinx docs and it is that one that has failed. If you dive into the build details, there's a cross reference to a different file that fails - it's just a typo for overview but our QA testing is set up to be ultra picky.

https://github.com/ImperialCollegeLondon/virtual_ecosystem/actions/runs/12277162911/job/34255998318#step:6:401

@@ -20,7 +20,7 @@ language_info:
name: python
nbconvert_exporter: python
pygments_lexer: ipython3
version: 3.11.9
version: 3.12.0rc3
---

# Getting started
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the first mention of pip or a command, should we assume less coding experience of the users and say explicitly where to run it? For example, before pip install below, say something like "...open up the Terminal and type:"

@@ -56,18 +56,36 @@ configuration and data files to run a model.
ve_run --install-example /path/
```

You can then run the model itself:
You can then run the model itself. If you have already run the simulation you will need
to delete or rename the output files, as previously generated output can prevent the
Copy link
Collaborator

@hrlai hrlai Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though one could just remove the files using file explorer, should we give an example code to remove the files (e.g., using rm) for those who have little experience with the Terminal?

If so, probably write it after the ve_run chunk for a better flow...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other file I just realised that @jacobcook1995 had already written something, maybe ditto them here for completeness?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's generally not great practice to use bash scripts to delete directories, because you might end up deleting something by mistake. It's useful in that example because the files are being stored in a kind of weird location (in the user directory) which might be hard to find using Finder. But I personally would encourage users to keep the data files in the virtual_ecosystem directory, because they will likely need to access them while working, and it will all stay a lot more organized that way. That is why I used this method - and in addition, I just think it overcomplicates the task to run a bash script, when the files should be easy to find.

Let me know what you think about that reasoning!!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All good. I agree that it is unnecessary :) The only thing left to decide if whether to list the filenames to be deleted, but this is not always the same (e.g., the initial states may not be saved depending on user setting.) When I first bumped into this I had to ask around which files to remove.

updated at each time step.
* `final_state.nc`: The model data state at the end of the final step.

These files are written to the standard NetCDF data file format.
Copy link
Collaborator

@hrlai hrlai Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe insert example code to remove them here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above - I think it's better in this tutorial to not have a script for that. LMK if you feel strongly about including it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, replied above :)

Ecosystem Output](virtual_ecosystem_in_use.md) tutorial, which walks you through basic
graphs using model inputs and outputs.
* The [Example Data](./example_data.md) pages provides a detailed description of the
contents of the `ve_example` directory. Here you can dig into the strucutre of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "strucutre" -> "structure".

delete the existing virtual ecosystem example directory, as previously generated files
can prevent the example simulation from running successfully. That can be done as
follows.
The following commands allow you to run the simulation from a Jupyter Notebook.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is great to remind the users that they can run the python codes using Jupyter Notebook. However, I struggle to even comprehend where to type the following commands after launching Jupyter Notebook, sorry! Most ecologists or biologists would be more familiar with R than python, so it would be a shame to turn them away because we didn't ease them into python.

That said, I struggle between providing too much details about python coding here (some users may already know how and find it boring) versus hand-carrying new users. Should we dedicate a new page just for using python codes in Jupyter, or at least point the users to an external tutorial page?

Copy link
Collaborator

@hrlai hrlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great start towards a clearer start-up tutorial! I have added my two cents, and the most major one is possibly to hand-carry the users more around how to run python codes using Jupyter Notebook...

slideshow:
slide_type: ''
tags: []
---
%%bash
# Remove any existing VE data directory in the /tmp/ directory
if [ -d /tmp/ve_example ]; then
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be simplified as rm -rf /tmp/ve_example ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh - I don't think of the -f flag that way, but it does explicitly not raise an error if the file is missing. We should note that is why we're using it, but it is less complex than the explicit test.

Comment on lines 42 to 47
---
editable: true
slideshow:
slide_type: ''
tags: []
---
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not figured out why Jupyter sometimes chucks some of these tags in or how to filter them out. I think we can do so with the myst filters in jupytext, and we definitely should because we do not want this!

slideshow:
slide_type: ''
tags: []
---
# Load the generated data files
initial_state = xarray.load_dataset("/tmp/ve_example/out/initial_state.nc")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user install the develop branch using pip they probably won't have the initial state file by default settings, see #544 I can confirm that this is still the case today on my computer.

```

+++ {"editable": true, "slideshow": {"slide_type": ""}, "tags": []}

### Initial state and input data

The `initial_state.nc` file contains all of the data required to run the model. For some
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the plot below, we reshape the vector output of elevation to be a matrix of dimension 9 x 9. Can we avoid hard-coding the row and column numbers from 9 to something from the netCDF output? I imagine someone to change the grid numbers eventually so this will help to automate the plotting a bit more.

Copy link
Collaborator

@davidorme davidorme Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's actually going to need to be a new function to reverse the internal representation of cells (which is a 1D array) back onto the grid. So that's going to be a general helper function (convert_cell_data_to_grid or similar).

But your point is a good one!

ETA - Actually that get's interesting. You need the grid configuration to go back to the 2D, so plotting spatial data will need a helper class (SpatialPlotter) that is created using the Grid config and then has a method (to_spatial) that maps the cell_id back onto the spatial layout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, something like convert_cell_data_to_grid would be a great addition indeed. Is the grid configuration (e.g., number of rows and columns) stored anywhere in the output files?

@sallymatson
Copy link
Collaborator Author

Going to close this. I think the issue still stands, but I think from this conversation / other chats in meetings and over email, it seems like we need a bit of a deeper think about what is most useful for the introduction to VE docs. I still have this on my radar to work on soon, just going to wait a bit to make sure I do it right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Streamline "Getting Started" documentation
4 participants