Skip to content

Commit

Permalink
Merge pull request #73 from robmoss/cop/meetings-2024-06-and-2024-07
Browse files Browse the repository at this point in the history
Add draft notes for our past two meetings
  • Loading branch information
robmoss authored Aug 8, 2024
2 parents 7fdde03 + 8183159 commit 4df1de1
Show file tree
Hide file tree
Showing 5 changed files with 214 additions and 0 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
96 changes: 96 additions & 0 deletions docs/community/meetings/2024-06-13.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# 13 June 2024

## Cam Zachreson: A comparison of three ABMs

In this meeting [Cam](https://github.com/cjzachreson) gave a presentation about the relative merits and trade-offs of three different approached for agent-based models (ABMs).

Attendance: 7 in person, 13 online.

## Theoretical frameworks

!!! tip "Key message"

Each framework is built upon different assumptions about space, contacts, and transmission.

Cam introduced three theoretical frameworks for disease transmission, which he used in constructing infectious disease models for three different projects.
Note that all three models **use the same within-host model** for individual dynamics.

1. [Border quarantine for COVID-19](https://doi.org/10.1126/sciadv.abm3624): international arrivals, quarantine workers, and the local community are divided into **mixing groups** within which there is close contact.
There is also weaker contact between these mixing goups.

2. [Social isolation in residential aged care facilites](https://doi.org/10.48550/arXiv.2401.01371): the transmission is a **multigraph** that explicitly simulates contact between individuals.
The graph is dynamic: workers and worker-room assignments can change every day.
Workers share *N* edges when they service *N* rooms in common.

3. A single hospital ward (work in progress): a **shared space** model represents spatial structure as a network of separate spaces (i.e., nodes).
Nurses and patients are associated with spaces according to schedules.
Each space has its own viral concentration, driven by shedding from infectious people and ventilation (the air changes around 6 times per hour).
Residence in a space results in a net viral dose, which confers a probability of infection (using the [Wells-Riley model](https://en.wikipedia.org/wiki/Wells-Riley_model)).


!!! question

Are many short interactions equivalent to one long interaction?

## Pros and cons of model structures

!!! tip "Key message"

Each framework has unique strengths and weaknesses.

As shown in the slide below, Cam identified a range of pros and cons for each modelling framework.
Some of the key trade-offs between these frameworks are:

- The ease of **validation** (aged care and hospital ward) versus the ease of **communication** (quarantine);

- Having **explicit physical parameters** and units (hospital ward) versus having **vague and/or phenomenological parameters** (quarantine and aged care); and

- Being **simple** to construct and **efficient** to run (quarantine and aged care) versus being **complex** to construct and **computationally expensive** (hospital ward).

<figure markdown="span">
![Pros and cons table](2024-06-13-pros-and-cons.png)
<figcaption>Pros and cons of the three approaches.</figcaption>
</figure>

## Constructing complex models

!!! tip "Key message"

Complex models typically have complex data requirements.

Data requirements can also present a challenge when constructing complex models.
For example, behaviour models are good for highly-structured environments such as hospital wards, where nurses have scheduled tasks that are performed in specific spaces.
However, the data required to construct the behaviour model can be very hard to collect, access, and use.
Even if nurses wear sensors, the sensor data are never sufficiently clean or complete to use without substantial cleaning and processing.

Airflow between spaces in a highly-structured environment is also complex to model.
Air exchange can involve diffusion between adjacent spaces, but also airflow between non-adjacent spaces through ventilation systems.
These flows can be difficult to identify, and are computationally expensive to simulate (computational fluid dynamics).

Cam concluded by observing that existing hospitals wards tend to have a design flaw for infection control:

> There are many shared spaces in which infection can spread among individuals via air transmission.
## Reproducibility in stochastic models

!!! tip "Key message"

These models rely on random number generators (RNGs), whose outputs can be controlled by defining their initial seed.
Using a separate RNG for each process in the model provides further advantages (see below).

In contrast to agent-based models of much larger populations, these models are small enough that they can be run as single-threaded code, and multiple simulations can be run in parallel.
The bulk of computational cost is usually sweeping over many combinations of parameter values.

The aged care (multigraph) and hospital ward (shared space) models decouple the population RNG from the transmission dynamics RNG.
**An advantage of using multiple RNGs** is that we can independently control and modify these processes.
For example, by using separate RNGs for infections and testing, we can adjust testing practices without affecting the infection process.

## Topic for a Masters project

!!! question

Does anyone know a suitable Masters student?

Cam is looking for a Masters student to undertake a project that will look at individual-level counterfactual scenarios.
The key idea is to identify sets of preconditions (e.g., salient details of the event history and/or current epidemic context) and ensure that the model will always generate the same outcome when given these preconditions.
An open question is how far back in the event history is necessary/sufficient.
111 changes: 111 additions & 0 deletions docs/community/meetings/2024-07-11.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# 11 July 2094

## Nefel Tellioglu: Lessons learned from pneumococcal vaccine modelling

In this meeting [Nefel](https://github.com/nefeltellioglu/) gave a presentation about a pneumococcal vaccine (PCV) evaluation project for government, sharing her experiences in developing a model from scratch under tight deadlines.

Attendance: 6 in person, 6 online.

!!! info

We welcome presentations about research projects and experiences that relate in any way to reproducibility and good computational research practices.
Presentations can be short, informal, and free-form.

Please contact us if you have anything you might like to present!

## Computational performance

!!! tip "Key message"

Optimisation is a skill that takes time to learn.

This project involved constructing an agent-based model (ABM) of pneumococcal disease, incorporating various vaccination assumptions and intervention strategies.
Nefel was familiar with an existing ABM framework written in Python, but found that the project requirements (a large population size and long simulation time-frames) meant that a different approach was required.

Asking for help in a new skill: model optimisation for each vaccine type and multi-strains

They ended up implementing a model from scratch, using the [Polars](https://pola.rs/) data frame library to represent each individual as a separate row in a single population data frame.
This library is designed for high performance, and Nefel was able to implement a model that ran quickly enough for the purposes of this project.

!!! question "An introduction to Polars workshop?"

Nefel asked whether other people would be interested in an "Introduction to Polars" workshop, and a number of participants indicated interest.

## Workflows and deadlines

!!! tip "Key message"

Using version control makes it much easier to fix your code when it breaks.

Nefel made frequent use of a git repository (hosted on GitHub) in the early stages of the project.
She found it very useful during the model prototyping phase, when adding new features frequently broke the code in some way.
Having immediate access to previous versions of the code made it much easier to revert changes and fix the code.

However, she stopped using it when the project reached a series of tight deadlines.

## Asking for extensions

!!! tip "Key message"

Being able to provide advance warning of potential delays, and to explain the reasons why they might occur, is extremely helpful **for everyone**.
This allows project leaders and stakeholders to adjust their plans and expectations.

It's generally hard to estimate feasible timelines in advance.
This is especially difficult when exploring a new problem, and when a range of future extensions are anticipated.

These kinds of conversations can feel extremely uncomfortable.
Several participants reflected on their own experiences, and agreed that informing their supervisors about potential problems as early as possible was the best approach.

Things can take longer than expected due to the research nature of building a new infectious disease model.
Where possible, avoid promising that a model will be completed by a certain time.
Instead, give stakeholders regular updates about progress and challenges, so that they can appreciate how much effort that is being applied to the problem.

> Gizem: stakeholders may not know what they want or need from the model.
> It's really helpful to clarify this early in the project, which needs a good working relationship.
> Eamon: writing your code in a modular way can help make it easier to implement those future extensions.
> Experience also helps in designing your code so that future extensions only modify small parts of your model.
> But avoid trying to make your code as abstract and extensible as possible.
> Rob: if you know that the model will be applied to many different scenarios in the future, try to separate the code that defines the location of data files from the code that uses those data.
> That makes it easier to run your model using different sets of input data.
## Related libraries for Python and R

!!! tip "Key message"

There are a number of high-performance data frame libraries.

[Polars](https://pola.rs/) primarily supports Python, Rust, and JavaScript.
There is also an [R package](https://pola-rs.github.io/r-polars/) that has several extensions, including:

- [polarssql](https://rpolars.github.io/r-polarssql/): a Polars backend for [DBI](https://dbi.r-dbi.org/) and [dbplyr](https://dbplyr.tidyverse.org/); and

- [tidypolars](https://tidypolars.etiennebacher.com/): [tidyverse](https://www.tidyverse.org/) syntax for Polars.

Other high-performance data frame options for R:

- [data.table](https://rdatatable.gitlab.io/data.table/): a high-performance `data.frame` replacement;

- [DBI](https://dbi.r-dbi.org/): a package for working with various databases; and

- [dbplyr](https://dbplyr.tidyverse.org/): a database backend for [dplyr](https://dplyr.tidyverse.org/).

[DuckDB](https://duckdb.org/) is another high-performance library for working with databases and tabular data, and is available for **many languages** including R, Python, and Julia.
It also integrates with Polars, allowing you to query Polars data frames and to save outputs as Polars data frames.

## Conclusions

!!! tip "Key message"

Once a project is completed, it's worth reflecting on what worked well, and on what you would do differently next time.

Nefel finished by reflecting on what she might do differently next time, and highlighting two key points:

- Begin with a clearer understanding of the skills required for the project, such as modelling large populations and code optimisation.

- Where there are potential skill gaps, involve other people in the project who can contribute relevant expertise.

## Next meeting

At our next meeting — currently scheduled for Thursday 8 August — we will work on finalising our [Orientation Guide checklist](../../orientation/README.md), collect supporting materials for each item on the checklist, and begin drafting content where no suitable supporting materials can be found.
4 changes: 4 additions & 0 deletions docs/community/meetings/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,10 @@

This section contains summaries of each Community of Practice meeting.

- [11 July 2024](2024-07-11.md): presentation from Nefel Tellioglu.

- [13 June 2024](2024-06-13.md): presentation from Cam Zachreson.

- [9 May 2024](2024-05-09.md): presentation from TK Le.

- [11 April 2024](2024-04-11.md): ideas and resource for the orientation guide.
Expand Down
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,8 @@ nav:
- community/README.md
- "Meetings":
- community/meetings/README.md
- "11 July 2024": community/meetings/2024-07-11.md
- "13 June 2024": community/meetings/2024-06-13.md
- "9 May 2024": community/meetings/2024-05-09.md
- "11 April 2024": community/meetings/2024-04-11.md
- "19 February 2024": community/meetings/2024-02-19.md
Expand All @@ -105,6 +107,7 @@ markdown_extensions:
- admonition
- attr_list
- footnotes
- md_in_html
- pymdownx.details
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
Expand Down

0 comments on commit 4df1de1

Please sign in to comment.