Skip to content

Commit

Permalink
Draft notes for our August meeting
Browse files Browse the repository at this point in the history
Also link to relevent topical guide content in the orientation guide.
  • Loading branch information
robmoss committed Aug 9, 2024
1 parent 4df1de1 commit 0bbdcf9
Show file tree
Hide file tree
Showing 4 changed files with 188 additions and 3 deletions.
182 changes: 182 additions & 0 deletions docs/community/meetings/2024-08-08.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
# 8 August 2024

## Orientation guide

!!! tip "Key message"

The aim for the Orientation Guide is to provide a short overview of import concepts, useful tools and resources, and links to relevant tutorials and examples.

In this meeting we discussed how the [Orientation Guide](../../orientation/README.md) could best address the needs of new students and staff.
We began by asking participants what skills, tools, and knowledge they've found to be particularly useful, and wish they'd discovered earlier.

Attendance: 5 in person, 2 online.

## Core tools and recommended packages

!!! tip "Key message"

There was strong interest in having **opinionated recommendations** for helpful software packages and libraries, based on our own experiences.

When we start out, we typically don't know what tools are available and how to choose between them.
So having guidance and recommendations from more experiences members of our community can be valuable.

This harks back to [TK's presentation](2024-05-09.md) and their reflections on [choosing the best tools](2024-05-09.md#tools-and-platforms) for a specific project or task.
For example:

!!! question

Which editor should a new student choose for their project?

We **strongly recommend** choosing an editor that can automatically [format](../../guides/writing-code/format-your-code.md) and [check](../../guides/writing-code/check-your-code.md) your code.

- For R, the best choice is almost surely [RStudio](https://posit.co/products/open-source/rstudio/).
- For Python there isn't necessarily a single best choice, but good options include [PyCharm](https://www.jetbrains.com/pycharm/) and [Spyder](https://www.spyder-ide.org/).
- The orientation guide should list editors that are popular within our community.

Eamon suggested that in addition to linking to tutorials for installing common tools such as Python and R, the orientation guide should recommend **helpful packages**.
For example:

- For R, the [`tidyr`](https://tidyr.tidyverse.org/) package and the broader collection of ["tidyverse" packages](https://www.tidyverse.org/).

- For Python, [Conda](https://docs.conda.io/) is probably the easiest way to install Python and scientific/numeric Python libraries on Windows and OS X (it also supports Linux).

Jacob: it would be nice to have a flowchart or diagram to help identify relevant tools and packages.
For example, if you want to (a) analyse tabular data; and (b) use Python; then what package would you recommend?
(Our answer was [Polars](https://pola.rs/)).

## Reproducible environments

!!! tip "Key message"

Virtual environments allow you to install the packages that are required for specific project, without affecting other projects.

This is useful for a number of reasons, including:

- Being able to manage each project independently and in isolation;

- Being able to use different versions of packages in different projects; and

- Making it simpler to set up and run a project on a new computer.

Python provides built-in tools for [virtual environments](https://packaging.python.org/en/latest/tutorials/installing-packages/), and the University of Melbourne's [Python Digital Skills Training](https://ai-javaher.gitbook.io/pythonclass) includes a workshop on [Python Virtual Environments](https://ai-javaher.gitbook.io/pythonclass/python-workshops/virtual-environment).

For R, the [`renv`](https://rstudio.github.io/renv/) packages provides similar capabilities, and the [Introduction to renv](https://rstudio.github.io/renv/articles/renv.html) article provides a good overview.

## Reproducible documents and notebooks

!!! tip "Key message"

Reproducible document formats such as RMarkdown (for R) and Jupyter Notebooks (for Python) provide a way to combine code, data, text, and figures in self-contained and reproducible plain-text files.

For introductions and examples, see:

- Nick Tierney's [RMarkdown for Scientists](https://rmd4sci.njtierney.com/);

- The [Jupyter Notebook documentation](https://jupyter-notebook.readthedocs.io/en/stable/); and

- The [Quarto](https://quarto.org/) publishing system is similar to RMarkdown, but allows you to write code in Python, R, and/or Julia, and supports many different [output formats](https://quarto.org/docs/gallery/).

If you use [VS Code](https://code.visualstudio.com/) to write Quarto documents, when you edit a code block it will open it in Jupyter (for Python code) and this allows you to step through and debug the code to some degree.

## Existing training courses and needs

!!! tip "Key message"

There will be an Introduction to Polars workshop at the [SPECTRUM](https://spectrum.edu.au/) 2024 annual meeting (23-25 September), led by Nefel Tellioglu and Julian Carlin.

We asked participants if they had found any training materials that were particularly useful.

Mackrina said that she is using Python in her PhD project, but previously only had experience with Matlab.

- Mackrina completed several online Python courses, but found that an in-person training session offered by the University of Melbourne's [Digital Skills Training](https://gateway.research.unimelb.edu.au/researcherdevelopment/researcher-connect/digital-skills-training) was the most useful.
They regularly run a wide range of training sessions, see the [list of upcoming sessions](https://gateway.research.unimelb.edu.au/researcherdevelopment/researcher-connect/digital-skills-training).

- Mackrina found that the [pypfilt](https://pypfilt.readthedocs.io/) package made it much easier to write ordinary differential equation (ODE) models and run scenario simulations.
**Note:** Rob is the `pypfilt` author and maintainer.
This package is designed for scenario modelling, and model-fitting using Sequential Monte Carlo (SMC) methods.

Other participants chimed in with recommended resources and training needs:

- Rob: The Software Carpentry provides a good range of [lessons](https://software-carpentry.org/lessons/).

- TK: I want to learn how to use [Docker](https://www.docker.com/) and containers, and how to (install and) use [greta](https://greta-stats.org/).

- Eamon: I'm happy to provide assistance and guidance with using [Stan](https://mc-stan.org/) to fit models using Hamiltonian Monte Carlo.

- Jiahao: Python ODE solvers are not described nearly as well as Matlab's ODE solvers, so they are harder to use.

## High-performance computing (HPC)

!!! question "Using GPGPUs for high-performance computing"

Jiahao asked: Does anyone in our community have experience with using GPGPUs?

In response to Jaihao's question, Eamon replied that he has found it to be near-impossible, due to a combination of:

- Difficulties in compiling compatible versions of the required libraries; and
- Figuring out how to adapt his code to make effective use of GPGPUs.

This initiated a broader discussion about improving the computational performance of our code and making use of high-performance computing (HPC) resources.

Computational performance was an issue that [Nefel encountered](2024-07-11.md#computational-performance) when constructing an agent-based model of pneumococcal disease, and she found that code optimisation is a skill that takes time to learn.

We discussed several ways about using multiple CPUs to make code run more quickly:

- Using libraries that automatically make use of multiple CPUs, such as [future.apply](https://future.apply.futureverse.org/) for R, [concurrent.futures](https://docs.python.org/3/library/concurrent.futures.html) for Python, and the [Polars](https://pola.rs/) data-frame library.

- Where we want to run large numbers of simulations, the easiest approach can often be for each simulation to only use one CPU, and to run many simulations in parallel (e.g., on virtual machines that have many CPUs).
However, as TK pointed out, if each simulation uses a large amount of RAM, it may not be possible to run many simulations in parallel with this approach.

- For larger scale problems, there are HPC platforms such as the University of Melbourne's [Spartan](https://dashboard.hpc.unimelb.edu.au/) and Monash University's [MASSIVE](https://www.massive.org.au/).
Using these platforms typically requires writing some bespoke code to define and schedule jobs.

## Debugging

!!! tip "Key message"

There was strong interest in running a debugging workshop at the upcoming [SPECTRUM](https://spectrum.edu.au/) 2024 annual meeting (23-25 September).
As [TK](2024-05-09.md) and [Nefel](2024-07-11.md) have shown in their presentations, skills like debugging are **extremely valuable** for projects with tight deadlines, but these projects are also **the worst time** in which to [develop](2024-05-09.md#other-activities) and [practice](2024-07-11.md#conclusions) these skills.

!!! info

Attendees confirmed their willingness to evaluate and provide feedback on workshop draft materials.

Rob reflected that many people struggle to effectively debug their code, and can end up wasting a lot of time.
Since we all make mistakes when writing code, this can be an extremely valuable skill.
This is particularly true when working on, e.g., modelling contracts with government (see, e.g., the recent presentations from [TK](2024-05-09.md) and [Nefel](2024-07-11.md)).

We discussed some general guidelines, such as:

- Structuring your code so that it's easier to debug (e.g., small functions);

- Avoid hard-coding numerical values, file names, etc, as much as possible;

- Making use of breakpoints rather than `print()` statements;

- Checking that input arguments to a function satisfy necessary conditions;

- Checking that output values from a function satisfy necessary conditions;

- Failing early (e.g., raising exceptions in Python, calling `stop()` in R) rather than returning values that may not be valid.

Eamon: by learning how to debug code, I substantially improved how I write and modularise my code.
My functions became smaller, and this helped me to make fewer mistakes.

For example, David Price and I encountered a bug in some C++ code where the function was correct, but made assumptions about the input arguments.
These assumptions were initially satisfied, but as other parts of the code were updated, these assumptions were no longer true.

To address this, I often write `if` statements at the top of a function to check these kinds of conditions, and stop if there are failures.
You can see examples of this in real-world code from popular packages.

James: I'm happy to provide an example of debugging within an R pipe.
[Learn Debugging](https://www.cse.unsw.edu.au/~learn/debugging/) might be a useful resource for our community.

Rob: failing early is good, rather than producing output and then having to discover that it's incorrect (which may not be obvious).
Related skills include learning how to read a stack trace, and defensive programming (such as checking input arguments, as Eamon mentioned).

TK: it's really hard to change existing habits.
And I'm not doing any coding in my projects right now.
My most recent coding experiences were in COVID-19 projects (see [TK's presentation](2024-05-09.md)) and the very tight deadlines didn't allow me the opportunity to develop and apply new skills.

Rob: **everyone already debugs and tests their code to some degree**, simply by writing and evaluating code line by line (e.g., in an interactive R or Python session) and by running functions with example arguments to check that they give sensible outputs.
We just need to "nudge" these behaviour to make it more systematic and reproducible.
4 changes: 3 additions & 1 deletion docs/community/meetings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,15 @@

This section contains summaries of each Community of Practice meeting.

- [8 August 2024](2024-08-08.md): orientation guide planning.

- [11 July 2024](2024-07-11.md): presentation from Nefel Tellioglu.

- [13 June 2024](2024-06-13.md): presentation from Cam Zachreson.

- [9 May 2024](2024-05-09.md): presentation from TK Le.

- [11 April 2024](2024-04-11.md): ideas and resource for the orientation guide.
- [11 April 2024](2024-04-11.md): ideas and resources for the orientation guide.

- [19 February 2024](2024-02-19.md): identify goals and activities for 2024.

Expand Down
4 changes: 2 additions & 2 deletions docs/orientation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,9 @@ There was broad interest in having a checklist, and example workflows for people
Suggested topics included:

- How to set up common tools on your laptop (e.g., Python and R);
- How to [organise your files](../guides/using-git/how-to-structure-a-repository.md);
- How to [define a workflow](../guides/project-structure/workflow.md) and [organise your files](../guides/using-git/how-to-structure-a-repository.md);
- How to write Markdown documents (see Nick Tierney's book [RMarkdown for Scientists](https://rmd4sci.njtierney.com/));
- How to format and lint your code;
- How to [format](../guides/writing-code/format-your-code.md) and [check](../guides/writing-code/check-your-code.md) your code;
- How to set up git [on your own device](../guides/using-git/README.md) and [using platforms such as GitHub](../guides/collaborating/README.md);
- How to recover [old versions of files](../guides/using-git/inspecting-your-history.md);
- How to make your code device-agnostic, so it can run on HPC platforms, virtual machines, and can easily be migrated to new devices; and
Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ nav:
- community/README.md
- "Meetings":
- community/meetings/README.md
- "8 August 2024": community/meetings/2024-08-08.md
- "11 July 2024": community/meetings/2024-07-11.md
- "13 June 2024": community/meetings/2024-06-13.md
- "9 May 2024": community/meetings/2024-05-09.md
Expand Down

0 comments on commit 0bbdcf9

Please sign in to comment.