diff --git a/pr-preview/pr-75/community/training/debugging/example-square-numbers/index.html b/pr-preview/pr-75/community/training/debugging/example-square-numbers/index.html index 6f8b1ffb..395d1057 100644 --- a/pr-preview/pr-75/community/training/debugging/example-square-numbers/index.html +++ b/pr-preview/pr-75/community/training/debugging/example-square-numbers/index.html @@ -3530,6 +3530,10 @@

Stepping through the code +

Manual breakpoints

+

You can also create breakpoints in your code by calling breakpoint() for Python, and browser() for R.

+

Interactive debugger sessions

If your editor supports running a debugger, use this feature! diff --git a/pr-preview/pr-75/search/search_index.json b/pr-preview/pr-75/search/search_index.json index a1123ea2..1aa6ab5d 100644 --- a/pr-preview/pr-75/search/search_index.json +++ b/pr-preview/pr-75/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":"

These materials aim to support early- and mid-career researchers (EMCRs) in the SPECTRUM and SPARK networks to develop their computing skills, and to make effective use of available tools1 and infrastructure2.

"},{"location":"#structure","title":"Structure","text":"

Start with the basics: Our orientation tutorials provide overviews of essential skills, tools, templates, and suggested workflows.

Learn more about best practices: Our topical guides explain a range of topics and provide exercises to test your understanding.

Come together as a community: Our Community of Practice is how we come together to share skills, knowledge, and experience.

"},{"location":"#motivation","title":"Motivation","text":"

Question

Why dedicate time and effort to learning these skills? There are many reasons!

The overall aim of these materials is help you conduct code-driven research more efficiently and with greater confidence.

Hopefully some of the following reasons resonate with you.

Foundations of effective research

A piece of code is often useful beyond a single project or study.

By applying the above skills in your research, you will be able to easily reproduce past results, extend your code to address new questions and problems, and allow others to build on your code in their own research.

The benefits of good practices can continue to pay off long after the project is finished.

"},{"location":"#license","title":"License","text":"

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

  1. Such as version control and testing frameworks.\u00a0\u21a9

  2. Such as the ARDC Nectar Research Cloud and Spartan.\u00a0\u21a9

"},{"location":"contributors/","title":"Contributors","text":"

Here is a list of the contributors who have helped develop these materials:

"},{"location":"how-to-contribute/","title":"How to contribute","text":""},{"location":"how-to-contribute/#add-a-case-study","title":"Add a case study","text":"

If you've made use of Git in your research activities, please let us know! We're looking for case studies that highlight how EMCRs are using Git. See the instructions for suggesting new content (below).

"},{"location":"how-to-contribute/#provide-comments-and-feedback","title":"Provide comments and feedback","text":"

The easiest way to provide comments and feedback is to create an issue. Note that this requires a GitHub account. If you do not have a GitHub account, you can email any of the authors. Please include \"Git is my lab book\" in the subject line.

"},{"location":"how-to-contribute/#suggest-modifications-and-new-content","title":"Suggest modifications and new content","text":"

This book is written in Markdown and is published using Material for MkDocs. See the Material for MkDocs Reference for an overview of the supported features.

You can suggest modifications and new content by:

Info

You can also edit any page by clicking the \"Edit this page\" button () in the top-right corner. This will start the process described above by forking the book repository.

Tip

When editing Markdown content, please start each sentence on a separate line. Also check that your text editor removes trailing whitespace.

This ensures that each commit will contain only the modified sentences, and makes it easier to inspect the repository history.

Tip

When you add a new page, you must also add the page to the nav block in mkdocs.yml.

"},{"location":"how-to-contribute/#adding-tabbed-code-blocks","title":"Adding tabbed code blocks","text":"

You can display content in multiple tabs by using ===. For example:

=== \"Python\"\n\n    ```py\n    print(\"Hello world\")\n    ```\n\n=== \"R\"\n\n    ```R\n    cat(\"Hello world\\n\")\n    ```\n\n=== \"C++\"\n\n    ```cpp\n    #include <iostream>\n\n    int main() {\n        std::cout << \"Hello World\";\n        return 0;\n    }\n    ```\n\n=== \"Shell\"\n\n    ```sh\n    echo \"Hello world\"\n    ```\n\n=== \"Rust\"\n\n    ```rust\n    fn main() {\n        println!(\"Hello World\");\n    }\n    ```\n

produces:

PythonRC++ShellRust
print(\"Hello world\")\n
cat(\"Hello world\\n\")\n
#include <iostream>\n\nint main() {\n    std::cout << \"Hello World\";\n    return 0;\n}\n
echo \"Hello world\"\n
fn main() {\n    println!(\"Hello World\");\n}\n
"},{"location":"how-to-contribute/#adding-terminal-session-recordings","title":"Adding terminal session recordings","text":"

You can use asciinema to record a terminal session, and display this recorded session with a small amount of HTML and JavaScript. For example, the following code is used to display the where-did-this-line-come-from.cast recording in a tab called \"Video demonstration\", as shown in Where did this line come from? chapter:

=== \"Video demonstration\"\n\n    <div id=\"demo\" data-cast-file=\"../where-did-this-line-come-from.cast\"></div>\n

You can also add links that jump to specific times in the video. Each link must have:

For example, the following code is used to display the video recording on the Choosing your Git Editor:

=== \"Git editor example\"\n\n    <div id=\"demo\" data-cast-file=\"../git-editor-example.cast\"></div>\n\n    Video timeline:\n\n    1. <a data-video=\"demo\" data-seek-to=\"4\" href=\"javascript:;\">Overview</a>\n    2. <a data-video=\"demo\" data-seek-to=\"17\" href=\"javascript:;\">Show how to use nano</a>\n    3. <a data-video=\"demo\" data-seek-to=\"71\" href=\"javascript:;\">Show how to use vim</a>\n

You can use the asciinema-scripted tool to generate scripted recordings.

"},{"location":"community/","title":"Community of Practice","text":"

Info

Communities of Practice are groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.

The community acts as a living curriculum and involves learning on the part of everyone.

The aim of a Community of Practice (CoP) is to come together as a community and engage in a process of collective learning in a shared domain. The three characteristics of a CoP are:

  1. Community: An environment for learning through interaction;

  2. Practice: Specific knowledge shared by community members; and

  3. Domain: A shared interest, problem, or concern.

We regularly meet as a community, report meeting summaries, and collect case studies that showcase good practices.

"},{"location":"community/#training-events","title":"Training events","text":"

To support skill development, we have the capacity to prepare and deliver bespoke training events as standalone session and as part of larger meetings and conferences. See our Training events page for further details.

"},{"location":"community/case-studies/","title":"Case studies","text":"

This section contains interesting and useful examples of incorporating Git into a research activity, as contributed by EMCRs in our network.

"},{"location":"community/case-studies/campbell-pen-and-paper-version-control/","title":"Pen and paper - a less user-friendly form of version control than Git","text":"

Author: Trish Campbell (patricia.campbell@unimelb.edu.au)

Project: Pertussis modelling

"},{"location":"community/case-studies/campbell-pen-and-paper-version-control/#the-problem","title":"The problem","text":"

In this project, I developed a compartmental model of pertussis to determine appropriate vaccination strategies. While plotting some single model simulations, I noticed anomalies in the modelled output for two experiments. The first experiment had an order of magnitude more people in the infectious compartments than in the second experiment, even though there seemed to be far fewer infections occurring. This scenario did not fit with the parameter values that were being used. In the differential equation file for my model, in addition to extracting the state of the model (i.e., the population in each compartment at each time step), for ease of analysis I also extracted the cumulative number of infections up to that time step. The calculation for this extraction of cumulative incidence was incorrect.

"},{"location":"community/case-studies/campbell-pen-and-paper-version-control/#the-solution","title":"The solution","text":"

The error occurred because susceptible people in my model were not all equally susceptible, and I failed to account for this when I calculated the cumulative number of infections at each time step. I identified that this was the problem by running some targeted test parameter sets and observing the changes in model output. The next step was to find out how long this bug had existed in the code and which analyses had been affected. While I was using version control, I tended to make large infrequent commits. I did, however, keep extensive hand-written notes in lab books, which played the role of a detailed history of commits. Searching through my historical lab books, I identified that I had introduced this bug into the code two years earlier. I was able to determine which parts of my results would have been affected by the bug and made the decision that all experiments needed to be re-run.

"},{"location":"community/case-studies/campbell-pen-and-paper-version-control/#how-version-control-helped","title":"How version control helped","text":"

Using a pen and paper form of version control enabled me to pinpoint the introduction of the error and identify the affected analyses, but it was a tedious process. While keeping an immaculate record of changes that I had made was invaluable, imagine how much simpler and faster the process would have been if I had been a regular user of an electronic version control system such as Git!

"},{"location":"community/case-studies/moss-incorrect-data-pre-print/","title":"Incorrect data in a pre-print figure","text":"

Author: Rob Moss (rgmoss@unimelb.edu.au)

Project: COVID-19 scenario modelling (public repository)

"},{"location":"community/case-studies/moss-incorrect-data-pre-print/#the-problem","title":"The problem","text":"

Our colleague James Trauer notified us that they suspected there was an error in Figure 2 of our COVID-19 scenario modelling pre-print article. This figure showed model predictions of the daily ICU admission demand in an unmitigated COVID-19 pandemic, and in a COVID-19 pandemic with case targeted public health measures. I inspected the script responsible for plotting this figure, and confirmed that I had mistakenly plotted the combined demand for ward and ICU beds, instead of the demand for ICU beds alone.

"},{"location":"community/case-studies/moss-incorrect-data-pre-print/#the-solution","title":"The solution","text":"

This mistake was simple to correct, but the obvious concern was whether any other outputs related to ICU bed demand were affected.

We conducted a detailed review of all data analysis scripts and outputs, and confirmed that this error only affected this single manuscript figure. It had no bearing on the impact of the interventions in each model scenario. Importantly, it did not affect any of the simulation outputs, summary tables, and/or figures that were included in our reports to government.

The corrected figure can be seen in the published article.

"},{"location":"community/case-studies/moss-incorrect-data-pre-print/#how-version-control-helped","title":"How version control helped","text":"

Because we used version control to record the development history of the model and all of the simulation analyses, we were able to easily inspect the repository state at the time of each prior analysis. This greatly simplified the review process, and ensured that we were inspecting the code exactly as it was when we produced each analysis.

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/","title":"Fixing a bug in pypfilt","text":"

Author: Rob Moss (rgmoss@unimelb.edu.au)

Project: pypfilt, a bootstrap particle filter for Python

Date: 27 October 2021

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/#overview","title":"Overview","text":"

I introduced a bug when I modified a function in my pypfilt package, and only detected the bug after I had created several more commits.

To resolve this bug, I had to:

  1. Notice the bug;

  2. Identify the cause of the bug;

  3. Write a test case to check whether the bug is present; and

  4. Fix the bug.

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/#notice-the-bug","title":"Notice the bug","text":"

I noticed that a regression test1 was failing: re-running a set of model simulations was no longer generating the same output. The results had changed, but none of my recent commits should have had this effect.

I should have noticed this when I created the commit that introduced this bug, but:

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/#identify-the-cause-of-the-bug","title":"Identify the cause of the bug","text":"

I knew that the bug had been introduced quite recently, and I knew that it affected a specific function: earlier_states(). Running git blame src/pypfilt/state.py indicated that the recent commit 408b5f1 was a likely culprit, because it changed many lines in this function.

In particular, I suspected the bug was occurring in the following loop, which steps backwards in time and handles the case where model simulations are reordered:

# Start with the parent indices for the current particles, which allow us\n# to look back one time-step.\nparent_ixs = np.copy(hist['prev_ix'][ix])\n\n# Continue looking back one time-step, and only update the parent indices\n# at time-step T if the particles were resampled at time-step T+1.\nfor i in range(1, steps):\n    step_ix = ix - i\n    if hist['resampled'][step_ix + 1, 0]:\n        parent_ixs = hist['prev_ix'][step_ix, parent_ixs]\n

In stepping through this code, I identified that the following line was incorrect:

    if hist['resampled'][step_ix + 1, 0]:\n

and that changing step_ix + 1 to step_ix should fix the bug.

Note: I could have used git bisect to identify the commit that introduced this bug, but running all of the test cases for each commit is relatively time-consuming; since I knew that the bug had been introduced quite recently, I chose to use git blame.

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/#write-a-test-case","title":"Write a test case","text":"

I wrote a test case test_earlier_state() that called this earlier_states() function a number of times, and checked that each set of model simulations were returned in the correct order.

This test case checks that:

  1. If the model simulations were not reordered, the original ordering is always returned;

  2. If the model simulations were reordered at some time t_0, the original ordering is returned for times t < t_0; and

  3. If the model simulations were reordered at some time t_0, the new ordering is returned for times t >= t_0.

This test case failed when I reran the testing pipeline, which indicated that it identified the bug.

"},{"location":"community/case-studies/moss-pypfilt-earlier-states/#fix-the-bug","title":"Fix the bug","text":"

With the test case now written, I was able to verify that that changing step_ix + 1 to step_ix did fix the bug.

I added the test case and the bug fix in commit 9dcf621.

In the commit message I indicated:

  1. Where the bug was located: the earlier_states() function;

  2. When the bug was introduced: commit 408b5f1; and

  3. Why the bug was not detected when I created commit 408b5f1.

  1. A regression test checks that a commit hasn't changed an existing behaviour or functionality.\u00a0\u21a9

"},{"location":"community/meetings/","title":"Meetings","text":"

This section contains summaries of each Community of Practice meeting.

"},{"location":"community/meetings/2023-04-17/","title":"17 April 2023","text":"

This is our initial meeting. The goal is to welcome people to the community and outline how we envision running these Community of Practice meetings.

"},{"location":"community/meetings/2023-04-17/#theme-reproducible-research","title":"Theme: Reproducible Research","text":"

Outline the theme and scope for this community.

This is open to all researchers who share an interest in reproducible research and/or related topics and practices; no prior knowledge is required.

For example, consider these questions:

Tip

The biggest challenge can often be remembering what you did and how you did it.

Making small changes to your practices can greatly improve reproducibilty!

"},{"location":"community/meetings/2023-04-17/#how-will-these-meetings-run","title":"How will these meetings run?","text":""},{"location":"community/meetings/2023-04-17/#preferred-communication-channels","title":"Preferred communication channels?","text":"

Info

To function effectively as a community, we need to support asynchronous discussions in addition to scheduled meetings.

One option is a dedicated mailing list. Other options were suggested:

Using a GitHub issue tracker might also serve as a gentle introduction to GitHub?

"},{"location":"community/meetings/2023-04-17/#supporting-activities-and-resources","title":"Supporting activities and resources?","text":"

Are there other activities that we could organise to help support the community?

"},{"location":"community/meetings/2023-04-17/#topics-for-future-meetings","title":"Topics for future meetings?","text":"

We asked each participant to suggest topics that they would like to see covered in future meetings and/or activities. A number of common themes emerged.

"},{"location":"community/meetings/2023-04-17/#version-control-from-theory-to-practice","title":"Version control: from theory to practice","text":"

A number of people mentioned now being sure how to get started, or starting with good intentions but ending up with a mess.

"},{"location":"community/meetings/2023-04-17/#working-with-less-technically-experienced-collaborators","title":"Working with less technically-experienced collaborators","text":"

How can we make best use of existing tools and practices, while working with collaborators who have less technical expertise/experience?

"},{"location":"community/meetings/2023-04-17/#reproducibility-best-practices-and-limitations","title":"Reproducibility: best practices and limitations","text":"

How far can/should we go in validating and demonstrating that our models and analyses are reproducible? How can we automate this? How is this impacted when we cannot share the input data, or when our models are extremely large and complex?

"},{"location":"community/meetings/2023-04-17/#testing-and-documentation","title":"Testing and documentation","text":"

How can we develop confidence in our own code, and in other people's code?

"},{"location":"community/meetings/2023-04-17/#code-reuse","title":"Code reuse","text":""},{"location":"community/meetings/2023-04-17/#using-chat-gpt-to-writecheck-code","title":"Using Chat GPT to write/check code","text":""},{"location":"community/meetings/2023-06-13/","title":"13 June 2023","text":"

In this meeting we asked participants to share their experiences exploring the version control, reproducibility, and testing exercises in our example repository.

This repository serves an introduction to testing models and ensuring that their outputs are reproducible. It contains a simple stochastic model that draws samples from a normal distribution, and some tests that check whether the model outputs are consistent with our expectations.

"},{"location":"community/meetings/2023-06-13/#what-is-a-reproducible-environment","title":"What is a reproducible environment?","text":"

The exercise description was deliberately very open, but it may have been too vague:

Define a reproducible environment in which the model can run.

We avoided listing possible details for people to consider, such as software and package versions. Perhaps a better approach would have been to ask:

If this code was provided as supporting materials for a paper, what other information would you need in order to run it and be confident of obtaining the same results as the original authors?

The purpose of a reproducible environment is to define all of these details, so that you never have to say to someone \"well, it runs fine on my machine\".

"},{"location":"community/meetings/2023-06-13/#reproducibility-and-stochasticity","title":"Reproducibility and stochasticity","text":"

Many participants observed that the model was not reproducible unless we used a random number generator (RNG) with a known seed, which would ensure that the model produces the same output each time we run it.

But what if you're using a package or library that internally uses their own RNG and/or seed? This may not be something you can fix, but you should be able to detect it by running the model multiple times with the same seed, and checking whether you get identical result each time.

Another important question was raised: do you, or should you, include the RNG seed in your published code? This is probably a good idea, and suggested solutions included setting the seed at the very start of your code (so that it's immediately visible) or including it as a required model parameter.

"},{"location":"community/meetings/2023-06-13/#writing-test-cases","title":"Writing test cases","text":"

Tip

Write a test case every time you find a bug: ensure that the test case finds the bug, then fix the bug, then ensure that the test case passes.

A test case is a piece of code that checks that something behaves as expected. This can be as simple as checking that a mathematical function returns an expected value, to running many model simulations and verifying that a summary statistic falls within an expected range.

Rather than trying to write a single test that checks many different properties of a piece of code, it can be much simpler and quicker to write many separate tests that each check a single property. This can provide more detailed feedback when one or more test cases fail.

Note

This approach is similar to how we rely on multiple public health interventions to protect against disease outbreaks! Consider each test case as a slice of Swiss cheese \u2014 many imperfect tests can provide a high degree of confidence in our code.

"},{"location":"community/meetings/2023-06-13/#writing-test-cases-for-conditions-that-may-fail","title":"Writing test cases for conditions that may fail","text":"

If you are testing a stochastic model, you may find certain test cases are difficult to write.

For example, consider a stochastic SIR model where you want to test that an intervention reduces the number of cases in an outbreak. You may, however, observe that in a small proportion of simulations the intervention has no effect (or it may even increase the number of cases).

One approach is to run many pairs of simulations and only check that the intervention reduced the number of cases at least X% of the time. You need to decide how many simulations to run, and what is an appropriate value for X%, but that's okay! Remember the Swiss cheese analogy, mentioned above.

"},{"location":"community/meetings/2023-06-13/#testing-frameworks","title":"Testing frameworks","text":"

If you have more than 2 or 3 test cases, it's a good idea to use a testing framework to automatically find your test cases, run each test, record whether it passed or failed, and report the results. These frameworks are usually specific to a single programming language.

Some commonly-used frameworks include:

"},{"location":"community/meetings/2023-06-13/#github-actions","title":"GitHub Actions","text":"

Multiple participants reported some difficulties in setting up GitHub actions and knowing how to adapt available templates to their needs. See the following examples:

We will aim to provide a GitHub action workflow for each model, and add comments to explain how to adapt these templates.

Warning

One downside of using GitHub Actions is the limited computation time of 2,000 minutes per month. This may not be suitable for large agent-based models and other long-running tasks.

"},{"location":"community/meetings/2023-06-13/#pull-requests","title":"Pull requests","text":"

At the time of writing, three participants have contributed pull requests:

Tip

If you make your own copy (\"fork\") of the example repository, you can create as many commits as you want. GitHub will display a message that says:

This branch is N commits ahead of rrcop:master.

Click on the \"N commits ahead\" link to see a summary of your new commits. You can then click the big green button \"Create pull request\".

This will not modify the example repository. Instead, it will create an overview of the changes between your code and the example repository. We can then review these changes, make suggestions, you can add new commits, etc, before deciding whether to add these changes to the example repository.

"},{"location":"community/meetings/2023-08-15/","title":"15 August 2023","text":"

Info

See the Resources section for links to useful resources that were mentioned in this meeting.

"},{"location":"community/meetings/2023-08-15/#changes-to-practices","title":"Changes to practices","text":"

In this meeting we asked everyone what changes (if any) they have made to their research and reproducibility practices since our last meeting.

A common theme was improving how we note and record our past actions. For example:

This ensures that stakeholders who want to use these models to run their own scenarios can reproduce the baseline scenarios without being modelling experts themselves.

The model is available as an online app.

"},{"location":"community/meetings/2023-08-15/#how-do-you-structure-a-project","title":"How do you structure a project?","text":"

Gizem asked the group \"How do you choose an appropriate project structure, especially if the project changes over time?\"

Phrutsamon: the TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data.

Rob: there may not be a single perfect solution that addresses everyone's needs. But look back at past projects, and try to imagine how the current project might change in the future. And if you're using version control, don't be afraid to experiment with different project structures \u2014 you can always revert back to an earlier commit.

"},{"location":"community/meetings/2023-08-15/#reviewing-code-as-part-of-manuscript-peer-review","title":"Reviewing code as part of (manuscript) peer review","text":"

Rob asked the group \"Has anyone reviewed supporting code when reviewing a manuscript?\"

Info

Pan mentioned a fantastic exercise for research students.

Pick a modelling paper that is relevant to their research project, and ask the student to:

  1. read it;
  2. understand it; and
  3. reproduce the figures.

This teaches the students that reproducibility is very important, and shows them what they need to do when they publish their own results.

It's important to pick a relatively simple paper, so that this task isn't too complicated for the student. And if the paper is written by a colleague or collaborator, you can contact them to ask for extra details, etc.

"},{"location":"community/meetings/2023-08-15/#using-shiny-to-make-models-availablereproducible","title":"Using Shiny to make models available/reproducible","text":"

Pan asked the group \"What do you think about (the extra work involved in) turning R code into Shiny applications, to show that the model is reproducible, and do so in a way that lets others easily make use it?\"

An objective of the COVID-19 International Modelling Consortium (CoMo) is to make models available and usable for non-modellers \u2014 turning models into something that anyone with minimal knowledge can explore.

The model is available as a Shiny app, and is continually being updated and refined. It is currently at version 19! Pan's group is trying to ensure that existing users update to the most recent version, because it can be very challenging and time-consuming to create scenario templates for older model versions. Templates are a good way to help the user define their scenario-specific settings, but it's a nightmare when you change the model version \u2014 it's like working with a new model.

Hadley Wickham has written a very good book about developing R Shiny applications. Gizem read a chapter of this book each morning, but found it necessary to practice in order to really understand how to use Shiny.

Info

Learning by doing (experiential learning) is a highly-effective way of convincing people to change their practices. It can be greatly enhanced by engaging as a community.

"},{"location":"community/meetings/2023-08-15/#resources","title":"Resources","text":""},{"location":"community/meetings/2023-08-15/#teaching-reproducibility-and-responsible-workflows","title":"Teaching reproducibility and responsible workflows","text":"

The Journal of Statistics and Data Science Education published a special issue: Teaching Reproducibility in November 2022. The accompanying editorial article highlights:

Integrating reproducibility into our practice and our teaching can seem intimidating initially. One way forward is to start small. Make one small change to add an element of exposing students to reproducibility in one class, then make another the next semester. Our students can get much of the benefit of reproducible and responsible workflows even if we just make a few small changes in our teaching. These efforts will help them to make more trustworthy insights from data. If it leads, by way of some virtuous cycle, to us improving our own practice, then even better! Improving our teaching through providing curricular guidance about reproducible science will take time and effort that should pay off in the long term.

This journal issue was followed by an invited paper session with the following presentations:

"},{"location":"community/meetings/2023-08-15/#project-templates","title":"Project templates","text":"

Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.

"},{"location":"community/meetings/2023-08-15/#using-shiny","title":"Using Shiny","text":""},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-r","title":"Continuous integration examples for R","text":""},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-python","title":"Continuous integration examples for Python","text":""},{"location":"community/meetings/2023-08-15/#other-continuous-integration-examples","title":"Other continuous integration examples","text":"

See the GitHub actions for Git is my lab book, available here. For example, the build action performs the following actions:

  1. Check out the repository, using actions/checkout;

  2. Install mdBook and other required tools, using make.

  3. Build a HTML version of the book, using mdBook.

"},{"location":"community/meetings/2023-10-18/","title":"18 October 2023","text":"

In this meeting we asked participants to share their experiences about good (and bad) ways to structure a project.

Info

We are currently drafting Project structure and Writing code guidelines.

See the pull request for further details. Please contribute suggestions!

We had six in-person and eight online attendees. Everyone predominantly uses one or more of the following languages:

"},{"location":"community/meetings/2023-10-18/#naming-files","title":"Naming files","text":"

The tidyverse style guide includes recommendations for naming files. One interesting recommendation in this guide is:

"},{"location":"community/meetings/2023-10-18/#choosing-a-directory-structure","title":"Choosing a directory structure","text":"

A common starting point is often one or more scripts in the root directory. But we can usually divide a project into several distinct steps or stages, and store the files necessary for each stage in a separate sub-directory.

Tip

Your project structure may change as the project develops. That's okay!

You might, e.g., realise that some files should be moved to a new, or different, sub-directory.

Packaging: Python and R allow you to bundle multiple code files into a \"package\". This makes it easier to use code that is split up into multiple files. It also makes it simpler to test and verify whether your code can be run on a different computer. To create a package, you need to provide some metadata, including a list of dependencies (packages or libraries that your code needs in order to run). When installing a Python or R package, it will automatically install the necessary dependencies too. You test this out on, e.g., a virtual machine to verify that you've correctly listed all of the necessary dependencies.

Version control: the history may be extremely useful for you, but may contain things you don't want to make publicly available. One solution would be to know from the very start what files you will want to make available and what files you do not (e.g., sensitive data files), but this is not always possible. Another, more realistic, solution is to create a new repository, copy over all of the files that you want to make available, and record these files in a single commit. The public repository will not share the history of your project repository, and that's okay \u2014 the public repository's primary purpose is to serve as a snapshot, rather than a complete and unedited history.

"},{"location":"community/meetings/2023-10-18/#locating-files","title":"Locating files","text":"

A common concern how to locate files in different sub-directories (e.g., loading code, reading data files, writing output files) without relying on using absolute paths. For loading code, Python and Matlab allow the user to add directories to the search path (e.g., by modifying sys.path in Python, or calling addpath() in Matlab). But these are not ideal solutions.

Absolute paths may not exist on other people's computers.

library(here)\ndata_file <- here(\"input-data/file-1.csv\")\n

Tip

A general solution for any programming language is to break your code into functions, each of which accepts input and/or output file names as arguments (when required). This means that most of your code is entirely independent of your chosen project structure. You can then store/generate all of the file paths in a single file, or in each of your top-level scripts.

"},{"location":"community/meetings/2023-10-18/#peer-review-get-feedback-on-project-structure","title":"Peer review: get feedback on project structure","text":"

It can be helpful to get feedback from someone who isn't directly involved in the project. They may view the work from a fresh perspective, and be able to identify aspects that are confusing or unclear.

When inviting someone to review your work, you should identify specific questions or tasks that you would like the reviewer to address.

With respect to project structure, you may want to ask the reviewer to address questions such as:

You could also ask the reviewer to look at a specific script or code file, and ask questions such as:

Info

For further ideas about useful peer review activities, and how to incorporate them into your workflow, see the following paper:

Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology, Ivimey-Cook et al., Journal of Evolutionary Biology 36(10):1347\u20131356, 2023.

"},{"location":"community/meetings/2023-10-18/#styling-and-formatting","title":"Styling and formatting","text":"

We also discussed opinions about how to name functions, variables, files, etc.

For example, R allows you to use periods (.) in function and variable names, but the tidyverse style guide recommends only using lowercase letters, numbers, and underscores (_).

If you review other people's code, and have other people review your code, you might be surprised by the different styles and conventions that people use. When reviewing code, these differences can be somewhat distracting.

"},{"location":"community/meetings/2023-10-18/#ai-tools-for-writing-and-reviewing-code","title":"AI tools for writing and reviewing code","text":"

There are AI tools that you can use to write, format, and review code, but you will need to check whether the code is correct. For example, GitHub Copilot is a (commercial) tool that accepts natural-language descriptions and generates computer code.

Tip

Feel free to use AI tools as a way to get started, but don't simply copy-and-paste the code they give you without reviewing it.

"},{"location":"community/meetings/2024-02-19/","title":"19 February 2024","text":"

In this meeting we asked participants to suggest goals and activities to achieve in 2024.

Note

If you were unable to attend the meeting, you can send us suggestions via email.

We have identified the following goals for 2024:

See the sections below for further details.

"},{"location":"community/meetings/2024-02-19/#orientation-materials","title":"Orientation materials","text":"

The first suggestion was to develop orientation materials for new students, postdocs, people coming from wet-lab backgrounds, etc. Suggested topics included:

Info

Some of these topics are already covered in Git is my lab book; see the links above.

There was broad interest in having a checklist, and example workflows for people to follow \u2014 particularly for projects that involve some form of code \"hand over\", to ensure that the recipients experience few problems in running the code themselves.

We aim to address these topics in Git is my lab book, with assistance and feedback from the community. See the How to contribute page for details.

"},{"location":"community/meetings/2024-02-19/#example-projects-and-model-templates","title":"Example projects and model templates","text":"

Building on the idea of orientation materials, a number of participants suggested providing example projects and different types of models.

The most commonly used languages in our community are:

As an example, we could demonstrate how to write an age-stratified SEIR ODE model in R and Python, and how to write agent-based models in vector and object-oriented forms.

Info

GitHub allows you to create template repositories, which might be a useful way of sharing such examples. We could host these template repositories in our Community of Practice GitHub organisation.

"},{"location":"community/meetings/2024-02-19/#how-and-why-to-begin-testing-your-code","title":"How (and why!) to begin testing your code","text":"

We asked everyone whether they'd ever found bugs in their code, and were relieved to see that yes, all of us have made mistakes! Writing test cases in one way to check that your code is behaving in the way that you expect.

As an example, Cam mentioned that he had written a stochastic model of a hospital ward, in which there was a queue of tasks. At the end of a shift, some tasks may not have been done, and these are put back on the queue for the next shift. Cam discovered there was a bug in the way this was done, and fixed it. However, later on he reintroduced the same bug. This is precisely the situation where regression tests are useful. In brief:

Continuous Integration (CI) is one way to run tests automatically, whenever you push a commit to a platform such as GitHub or GitLab. See the list of resources shared in our previous meeting for some examples of using CI.

In our community we have a number of people with familiarity and expertise in testing infectious disease models and related code. We need to share this knowledge and help others in the community to learn how to test their code.

"},{"location":"community/meetings/2024-02-19/#peer-code-review","title":"Peer code review","text":"

We talked about how we improve our writing skills by receiving feedback on drafts from colleagues, supervisors, etc, and how a similar approach can be extremely useful for improving our coding skills.

Info

A goal for this year is to review each other's code! Note that we have developed some guidelines for peer code review.

Suggestions included:

We can coordinate peer code review through our Community of Practice GitHub organisation.

"},{"location":"community/meetings/2024-02-19/#sharing-code-but-not-the-original-data","title":"Sharing code but not the original data","text":"

Samik mentioned that in a recent paper, The impact of Covid-19 vaccination in Aotearoa New Zealand: A modelling study, the code is provided in a public GitHub repository, but that they do not have permission to share the original health data.

Info

We can frequently encounter this issue when working with public health and clinical data sets.

What are the best practices in this case?

"},{"location":"community/meetings/2024-04-11/","title":"11 April 2024","text":"

In this meeting we asked participants to suggest specific tools, templates, packages, etc, that we should include in our Orientation guide. We used the high-level topics proposed in our previous meeting as a starting point.

Attendance: 7 in person, 9 online.

Git is my lab book updates

We have switched to Material for MkDocs, which greatly improves the user experience.

For example, use the search bar above (press F) to interactively search the entire website.

"},{"location":"community/meetings/2024-04-11/#purpose-of-the-guide","title":"Purpose of the guide","text":"

We are aiming to keep the orientation guide short and simple, to avoid overwhelming newcomers.

James Ong: If we can agree on a structure, we can then get people to contribute to specific sections.

TK Le: schedule a one-hour meeting where everyone works on writing content for 30 minutes, and then reviews each others' content for 30 minutes?

"},{"location":"community/meetings/2024-04-11/#project-organisation","title":"Project organisation","text":"

Key message

A project's structure may need to change over time, and that's fine. What matters is that the structure is explained.

A common theme raised by a number of participants was deciding how to organise your files, dividing projects into cohesive parts, and explaining the relationships between these parts (i.e., how they interact or come together).

Useful tools mentioned in this conversation included:

Info

We are drafting topical guides about these topics. See the online previews for the following guides:

If you have any suggestions or feedback, please let us know in the pull request!

"},{"location":"community/meetings/2024-04-11/#working-collaboratively","title":"Working collaboratively","text":"

Key message

Plan to work collaboratively from the outset. It is highly likely that someone else will end up using your code.

Nick Tierney: you are always collaborating with your future self!

One concern raised was how best to prepare your code for handover.

Pan: You need to think about it from the beginning. There will be more and more people trying to use existing models. I am writing a guideline about vaccination modelling, and referring to readers as the \"model user\" (developers, modellers, end users). If we plan for others to use our model, we need to develop the model in a way that aims to make it easier for people to use.

Reminder

We have developed a topical guide on how to use git for collaborative research.

"},{"location":"community/meetings/2024-04-11/#reviewing-code-and-project-structure","title":"Reviewing code and project structure","text":"

Key message

Feedback from other people can be extremely useful to identify sources of confusion.

The earlier that someone can review your code and project structure, the easier it should be to act on their feedback.

Saras Windecker mentioned that the Infectious Disease Ecology and Modelling (IDEM) team organised code review sessions that the team found really informative, but also reminded everyone how hard it is to have guidelines that are consistent and broadly useful.

Question: was the purpose of these sessions to review code, or to review the project structure?

They were intended to review code, but team members found they had to review the project structure before they could begin to understand and improve the code.

Question: What materials, inputs, resources, etc, can we provide people who are dealing with messy code?

Rob Moss reflected on his experience of picking up a within-host malaria RBC spleen model and how difficult it was to identify which parts of the model were complete and which needed further work. He gradually divided the code into separate files, and regularly ran model simulations to verify that the outputs were consistent with the original code.

Info

Rob is happy to share the original model code, discuss how it was difficult to understand, and to walk through how he restructured and documented it. If you're interested, send him an email.

"},{"location":"community/meetings/2024-04-11/#how-to-structure-your-data","title":"How to structure your data","text":"

Key message

If data are shared, they often lack the documentation to make them easy to reuse.

Nick Tierney asked if anyone had thoughts on how to structure their data? Consistent with our earlier discussion, he suggested that one of the most important things is to have a README that explains the project structure. He then shared one of his recent papers Common-sense approaches to sharing tabular data alongside publication.

Question: do you advocate for data to be tidied (\"long\"), etc?

"},{"location":"community/meetings/2024-04-11/#managing-confidential-data","title":"Managing confidential data","text":"

Key message

There are various ways to manage confidential data, each with pros and cons.

Michael Plank asked for ideas about how to manage confidential data when working with collaborators, to ensure that everyone is using the most recent version of the data. Obviously you don't want to commit the data files in your project repository, so the data files should be listed in your .gitignore file.

The most suitable solution probably depends on a combination of factors, including:

"},{"location":"community/meetings/2024-04-11/#debugging","title":"Debugging","text":"

Key message

Debugging is an important skill, but good coding practices are important for making your code easier to test and debug.

A number of people suggested that the orientation guide should provide some information about how to debug your code.

Nick Tierney: I could go on a long rant about debugging, and why we should be teaching how to divide code into small functions that are easier to test!

We also discussed that there are various ways to debug your code, from printing debugging messages, to using an interactive debugger, to writing test cases that explain how the code should work.

Rob Moss: I've used regression tests and continuous integration (CI) to detect when I accidentally change the outputs of my code/model. For example, my SMC package for Python includes tests cases that record simulation outputs, and after running the tests I check that the output files remain unchanged.

"},{"location":"community/meetings/2024-04-11/#guidelines-for-using-ai","title":"Guidelines for using AI","text":"

Key message

Practices such as code review and testing are even more important for code that is created using generative AI.

Pan: speaking for junior students, a lot of students are now using ChatGPT for their coding, either to create the initial structure or to transform code from one programming language to another language.

Question: Can we provide useful guidelines for those students?

James: this probably isn't something we will cover in the orientation guide. But perhaps we need some guidelines for generative AI use in the topical guides.

Testing your code and ensuring it is reproducible is even more important when using ChatGPT. We know it doesn't always give you correct code, so how can you decide whether what it's given you is useful? It would be great to have an example of code generated by ChatGPT that is incorrect or unnecessarily difficult to understand, and to show how we can improve that code.

A question for the community

Does anyone have any examples of code produced by ChatGPT that didn't function as intended?

"},{"location":"community/meetings/2024-04-11/#useful-resources","title":"Useful resources","text":"

The following resources were mentioned in this meeting:

"},{"location":"community/meetings/2024-05-09/","title":"9 May 2024","text":""},{"location":"community/meetings/2024-05-09/#presentation-by-tk-le","title":"Presentation by TK Le","text":"

In this meeting TK Le gave a presentation about a series of COVID-19 modelling projects and how their experiences were impacted by the choice of programming languages, model structure, editors, tools, etc.

Attendance: 4 in person, 4 online.

Info

We welcome presentations about research projects and experiences that relate in any way to reproducibility and good computational research practices. Presentations can be short, informal, and free-form.

Please contact us if you have anything you might like to present!

"},{"location":"community/meetings/2024-05-09/#three-projects-four-models","title":"Three projects, four models","text":"

This work began in 2022, and was based on code that was originally written by Eamon and Camelia. TK adapted the code to run new scenarios across three separate modelling projects:

  1. Modelling the impact of hybrid immunity on futute COVID-19 waves;
  2. Cost-effective boosting allocations in the post-Omicron era of COVID-19 management; and
  3. Confidential scenario modelling for the Australian Department of Health & Aged Care.

The workflow was divided into a series of distinct models:

  1. An immunological model written in R and greta;

  2. An agent-based model of population transmission written in C++ (post-processing written in R);

  3. A clinical pathways model implemented in MATLAB; and

  4. A cost-effectiveness model implemented in R.

TK's primary activities involved implementing different vaccine schedules in the transmission model, and data visualisation of outputs from the transmission model and the clinical pathways, all of which they implemented in Python.

"},{"location":"community/meetings/2024-05-09/#the-multi-model-structure","title":"The multi-model structure","text":"

Key message

There isn't necessarily a single best way to structure a large project.

Question: Was it a benefit to have separate model implementations in different languages, with clearly defined data flows from one model to the next?

Conceptually yes, but this structure also involved a lot of trade-offs \u2014 even the sheer volume of data that had to be saved by one model and read into another. It was difficult to pick up and use the code as a whole. And there were related issues, such as being able to successfully install greta.

Nick Tierney: I know that greta can be difficult to install, and I can help people with this.

TK also noted that they found some minor inconsistencies between the models, such as whether a date was used to identify the start or end of its 24-hour interval.

"},{"location":"community/meetings/2024-05-09/#tools-and-platforms","title":"Tools and platforms","text":"

Key message

Personal preferences can play a major role in deciding which tools are best for a project.

The various models were hosted in repositories on Bitbucket, GitHub, and the University of Melbourne's GitLab instance. TK noted that the only discernible differences between these platforms was how they handled authorisation and authentication.

TK also explored several different editors, some which were language-specific:

TK noted they had previous experience with Eclipse (an IDE for Java) and Visual Studio (which felt very \"heavy\").

Question: what were your favourite things in VS Code, and what made RStudio the worst?

It was easiest to open a project in VS Code, RStudio would always open up an entire project or previous workspace, rather than just opening a single file. RStudio also kept asking to update itself.

TK also strongly disliked the RStudio font, which was another source of friction. They tried installing an RStudio extension for VS Code, but weren't sure how well it worked.

Nick Tierney: R history is really annoying, but you can turn it off. I'm not sure why it's enabled by default, all of the RStudio recommendations involve turning off these kinds of things.

Rahmat Sagara: I'm potentially interested in using VS Code instead of RStudio.

Eamon Conway: the worst thing about VS code is that debugging is very hard to setup.

"},{"location":"community/meetings/2024-05-09/#task-management","title":"Task management","text":"

Key message

Task management is hard, and switching to a new system during a project is extremely difficult.

TK reported trying to use Gitlab issues to plan out what to do and how to do it, but found they weren't a good fit with their workflow. They then trialled Trello boards for different versions, but stopped using them due to a lack of time to manage the boards. In review:

Rob Moss: we know that behaviour changes are hard to achieve, so it's not surprising that a large change was challenging to maintain \u2014 ideally we would make small, incremental changes, but this isn't always possible or useful.

Eamon Conway: I like the general idea of using task-tracking software, but I've settled on only using paper. It's always with me, it's always at home, and it's physically under my keyboard!

Ruarai Tobin: I use Notion with a single large Markdown file, you can paste screenshots.

"},{"location":"community/meetings/2024-05-09/#repository-structure","title":"Repository structure","text":"

Key message

There are many factors and competing concerns that influence the repository structure.

The repository structure changed a lot across these three projects.

In the beginning, the main challenge was separating out the different parts. While this was achieved, it wasn't immediately obvious where a user was supposed to start \u2014 the file structure did not make it clear. The README.md file did, however, include an explanation.

By the final project, the repository was divided into a number of directories, each of which was given a numeric prefix (e.g, 0_data and 4_post_processing). However, this was also a little misleading:

Question: is there an automated pipeline?

TK replied that the user had to run the code in each of the numbered folders in the correct (ascending) order, and that they wanted to make improvements for automating the dependent jobs on Spartan.

Eamon Conway: if you do ever automate it, we should share it with people (e.g., this community) because people may be able to learn from it when they want to use Spartan. I know how to use slurm for job management and can help you automate it.

"},{"location":"community/meetings/2024-05-09/#data-visualisation","title":"Data visualisation","text":"

Key message

Producing good figures takes a lot of time, thought, and experimentation, and also a lot of coding.

It was extremely hard to decide what to show, and how to show it, in order to highlight key messages.

It was very easy to overwhelm the viewer with complicated plots and massive legends. For example, the scenarios involved three epidemic waves, and how can you show relationships between each pair of waves? It is relatively simple to build a 3D plot that shows these relationships, but the viewer can't really interpret the results.

"},{"location":"community/meetings/2024-05-09/#other-activities","title":"Other activities","text":"

Key message

Following better practices would have required maybe 50% more time, but there wasn't funding \u2014 who will pay for this?

Dedicating time to other activities was not feasible \u2014 no one had time, these projects had fixed deadlines and it was challenging to complete the work within these deadlines.

As explained above, data visualisation took longer than expected. And sometimes the code simply wouldn't run on high-performance computing platforms. For example, sometimes Matlab just wouldn't load, there were intermittent failures for no apparent reason and with no useful error messages.

Activities that would have been nice to do, but were not undertaken, included:

Rob Moss: we're very unlikely to get funding to explicitly cover these activities. If possible, we need to allocate sufficient time in our budgets, as best as possible. Practising these skills on smaller scales can also help us to use them with less overhead in larger projects.

"},{"location":"community/meetings/2024-05-09/#version-control-and-publishing","title":"Version control and publishing","text":"

Key message

This can be challenging even with all of the tools and infrastructure in place.

Question: were all of the projects wrapped up into one central thing?

No, they're all separate. The first project was provided as a zip file attached to the paper. The second project is in a public git repository. The final project is ongoing and remains confidential, it is stored in a private repository on the University of Melbourne's GitLab instance.

Question: did the latest project build on the previous ones?

Yes, and this led to managing current and older versions of the code. For example, TK found a bug that caused a minor discrepancy between times reported in two different models (see The multi-model structure) but it wasn't possible to update the older code and regenerate the associated outputs.

Question: should we use git (and GitHub) only for publication, or during the work itself?

Eamon Conway: Use it from the beginning to track your work, and maybe have different privacy settings (confidential code and/or data).

Rob Moss: you can use a git repository for your own purposes during the work, and upload a snapshot to Figshare or Zenodo to obtain a DOI that you can cite in your paper.

"},{"location":"community/meetings/2024-05-09/#broader-conclusions","title":"Broader conclusions","text":"

Changing our behaviour and work habits is hard, and so is finding time to develop these skills. We need to practice these skills on small problems first, rather than on large projects (and definitely not when there are tight deadlines).

A question for the community

Should we organise an event to practice and develop these skills on small-scale problems?

"},{"location":"community/meetings/2024-06-13/","title":"13 June 2024","text":""},{"location":"community/meetings/2024-06-13/#cam-zachreson-a-comparison-of-three-abms","title":"Cam Zachreson: A comparison of three ABMs","text":"

In this meeting Cam gave a presentation about the relative merits and trade-offs of three different approached for agent-based models (ABMs).

Attendance: 7 in person, 13 online.

"},{"location":"community/meetings/2024-06-13/#theoretical-frameworks","title":"Theoretical frameworks","text":"

Key message

Each framework is built upon different assumptions about space, contacts, and transmission.

Cam introduced three theoretical frameworks for disease transmission, which he used in constructing infectious disease models for three different projects. Note that all three models use the same within-host model for individual dynamics.

  1. Border quarantine for COVID-19: international arrivals, quarantine workers, and the local community are divided into mixing groups within which there is close contact. There is also weaker contact between these mixing goups.

  2. Social isolation in residential aged care facilites: the transmission is a multigraph that explicitly simulates contact between individuals. The graph is dynamic: workers and worker-room assignments can change every day. Workers share N edges when they service N rooms in common.

  3. A single hospital ward (work in progress): a shared space model represents spatial structure as a network of separate spaces (i.e., nodes). Nurses and patients are associated with spaces according to schedules. Each space has its own viral concentration, driven by shedding from infectious people and ventilation (the air changes around 6 times per hour). Residence in a space results in a net viral dose, which confers a probability of infection (using the Wells-Riley model).

Question

Are many short interactions equivalent to one long interaction?

"},{"location":"community/meetings/2024-06-13/#pros-and-cons-of-model-structures","title":"Pros and cons of model structures","text":"

Key message

Each framework has unique strengths and weaknesses.

As shown in the slide below, Cam identified a range of pros and cons for each modelling framework. Some of the key trade-offs between these frameworks are:

Pros and cons of the three approaches."},{"location":"community/meetings/2024-06-13/#constructing-complex-models","title":"Constructing complex models","text":"

Key message

Complex models typically have complex data requirements.

Data requirements can also present a challenge when constructing complex models. For example, behaviour models are good for highly-structured environments such as hospital wards, where nurses have scheduled tasks that are performed in specific spaces. However, the data required to construct the behaviour model can be very hard to collect, access, and use. Even if nurses wear sensors, the sensor data are never sufficiently clean or complete to use without substantial cleaning and processing.

Airflow between spaces in a highly-structured environment is also complex to model. Air exchange can involve diffusion between adjacent spaces, but also airflow between non-adjacent spaces through ventilation systems. These flows can be difficult to identify, and are computationally expensive to simulate (computational fluid dynamics).

Cam concluded by observing that existing hospitals wards tend to have a design flaw for infection control:

There are many shared spaces in which infection can spread among individuals via air transmission.

"},{"location":"community/meetings/2024-06-13/#reproducibility-in-stochastic-models","title":"Reproducibility in stochastic models","text":"

Key message

These models rely on random number generators (RNGs), whose outputs can be controlled by defining their initial seed. Using a separate RNG for each process in the model provides further advantages (see below).

In contrast to agent-based models of much larger populations, these models are small enough that they can be run as single-threaded code, and multiple simulations can be run in parallel. The bulk of computational cost is usually sweeping over many combinations of parameter values.

The aged care (multigraph) and hospital ward (shared space) models decouple the population RNG from the transmission dynamics RNG. An advantage of using multiple RNGs is that we can independently control and modify these processes. For example, by using separate RNGs for infections and testing, we can adjust testing practices without affecting the infection process.

"},{"location":"community/meetings/2024-06-13/#topic-for-a-masters-project","title":"Topic for a Masters project","text":"

Question

Does anyone know a suitable Masters student?

Cam is looking for a Masters student to undertake a project that will look at individual-level counterfactual scenarios. The key idea is to identify sets of preconditions (e.g., salient details of the event history and/or current epidemic context) and ensure that the model will always generate the same outcome when given these preconditions. An open question is how far back in the event history is necessary/sufficient.

"},{"location":"community/meetings/2024-07-11/","title":"11 July 2094","text":""},{"location":"community/meetings/2024-07-11/#nefel-tellioglu-lessons-learned-from-pneumococcal-vaccine-modelling","title":"Nefel Tellioglu: Lessons learned from pneumococcal vaccine modelling","text":"

In this meeting Nefel gave a presentation about a pneumococcal vaccine (PCV) evaluation project for government, sharing her experiences in developing a model from scratch under tight deadlines.

Attendance: 6 in person, 6 online.

Info

We welcome presentations about research projects and experiences that relate in any way to reproducibility and good computational research practices. Presentations can be short, informal, and free-form.

Please contact us if you have anything you might like to present!

"},{"location":"community/meetings/2024-07-11/#computational-performance","title":"Computational performance","text":"

Key message

Optimisation is a skill that takes time to learn.

This project involved constructing an agent-based model (ABM) of pneumococcal disease, incorporating various vaccination assumptions and intervention strategies. Nefel was familiar with an existing ABM framework written in Python, but found that the project requirements (a large population size and long simulation time-frames) meant that a different approach was required.

Asking for help in a new skill: model optimisation for each vaccine type and multi-strains

They ended up implementing a model from scratch, using the Polars data frame library to represent each individual as a separate row in a single population data frame. This library is designed for high performance, and Nefel was able to implement a model that ran quickly enough for the purposes of this project.

An introduction to Polars workshop?

Nefel asked whether other people would be interested in an \"Introduction to Polars\" workshop, and a number of participants indicated interest.

"},{"location":"community/meetings/2024-07-11/#workflows-and-deadlines","title":"Workflows and deadlines","text":"

Key message

Using version control makes it much easier to fix your code when it breaks.

Nefel made frequent use of a git repository (hosted on GitHub) in the early stages of the project. She found it very useful during the model prototyping phase, when adding new features frequently broke the code in some way. Having immediate access to previous versions of the code made it much easier to revert changes and fix the code.

However, she stopped using it when the project reached a series of tight deadlines.

"},{"location":"community/meetings/2024-07-11/#asking-for-extensions","title":"Asking for extensions","text":"

Key message

Being able to provide advance warning of potential delays, and to explain the reasons why they might occur, is extremely helpful for everyone. This allows project leaders and stakeholders to adjust their plans and expectations.

It's generally hard to estimate feasible timelines in advance. This is especially difficult when exploring a new problem, and when a range of future extensions are anticipated.

These kinds of conversations can feel extremely uncomfortable. Several participants reflected on their own experiences, and agreed that informing their supervisors about potential problems as early as possible was the best approach.

Things can take longer than expected due to the research nature of building a new infectious disease model. Where possible, avoid promising that a model will be completed by a certain time. Instead, give stakeholders regular updates about progress and challenges, so that they can appreciate how much effort that is being applied to the problem.

Gizem: stakeholders may not know what they want or need from the model. It's really helpful to clarify this early in the project, which needs a good working relationship.

Eamon: writing your code in a modular way can help make it easier to implement those future extensions. Experience also helps in designing your code so that future extensions only modify small parts of your model. But avoid trying to make your code as abstract and extensible as possible.

Rob: if you know that the model will be applied to many different scenarios in the future, try to separate the code that defines the location of data files from the code that uses those data. That makes it easier to run your model using different sets of input data.

"},{"location":"community/meetings/2024-07-11/#related-libraries-for-python-and-r","title":"Related libraries for Python and R","text":"

Key message

There are a number of high-performance data frame libraries.

Polars primarily supports Python, Rust, and JavaScript. There is also an R package that has several extensions, including:

Other high-performance data frame options for R:

DuckDB is another high-performance library for working with databases and tabular data, and is available for many languages including R, Python, and Julia. It also integrates with Polars, allowing you to query Polars data frames and to save outputs as Polars data frames.

"},{"location":"community/meetings/2024-07-11/#conclusions","title":"Conclusions","text":"

Key message

Once a project is completed, it's worth reflecting on what worked well, and on what you would do differently next time.

Nefel finished by reflecting on what she might do differently next time, and highlighting two key points:

"},{"location":"community/meetings/2024-07-11/#next-meeting","title":"Next meeting","text":"

At our next meeting \u2014 currently scheduled for Thursday 8 August \u2014 we will work on finalising our Orientation Guide checklist, collect supporting materials for each item on the checklist, and begin drafting content where no suitable supporting materials can be found.

"},{"location":"community/meetings/2024-08-08/","title":"8 August 2024","text":""},{"location":"community/meetings/2024-08-08/#orientation-guide","title":"Orientation guide","text":"

Key message

The aim for the Orientation Guide is to provide a short overview of import concepts, useful tools and resources, and links to relevant tutorials and examples.

In this meeting we discussed how the Orientation Guide could best address the needs of new students and staff. We began by asking participants what skills, tools, and knowledge they've found to be particularly useful, and wish they'd discovered earlier.

Attendance: 5 in person, 2 online.

"},{"location":"community/meetings/2024-08-08/#core-tools-and-recommended-packages","title":"Core tools and recommended packages","text":"

Key message

There was strong interest in having opinionated recommendations for helpful software packages and libraries, based on our own experiences.

When we start out, we typically don't know what tools are available and how to choose between them. So having guidance and recommendations from more experienced members of our community can be valuable.

This harks back to TK's presentation and their reflections on choosing the best tools for a specific project or task. For example:

Question

Which editor should a new student use for their project?

We strongly recommend choosing an editor that can automatically format and check your code.

Eamon suggested that in addition to linking to tutorials for installing common tools such as Python and R, the orientation guide should recommend helpful packages. For example:

Jacob: it would be nice to have a flowchart or diagram to help identify relevant tools and packages. For example, if you want to (a) analyse tabular data; and (b) use Python; then what package would you recommend? (Our answer was Polars).

"},{"location":"community/meetings/2024-08-08/#reproducible-environments","title":"Reproducible environments","text":"

Key message

Virtual environments allow you to install the packages that are required for a specific project, without affecting other projects.

This is useful for a number of reasons, including:

Python provides built-in tools for virtual environments, and the University of Melbourne's Python Digital Skills Training includes a workshop on Python Virtual Environments.

For R, the renv packages provides similar capabilities, and the Introduction to renv article provides a good overview.

"},{"location":"community/meetings/2024-08-08/#reproducible-documents-and-notebooks","title":"Reproducible documents and notebooks","text":"

Key message

Reproducible document formats such as RMarkdown (for R) and Jupyter Notebooks (for Python) provide a way to combine code, data, text, and figures in self-contained and reproducible plain-text files.

For introductions and examples, see:

If you use VS Code to write Quarto documents, when you edit a code block it will open it in Jupyter (for Python code) and this allows you to step through and debug the code to some degree.

"},{"location":"community/meetings/2024-08-08/#existing-training-courses-and-needs","title":"Existing training courses and needs","text":"

Key message

There will be an Introduction to Polars workshop at the SPECTRUM 2024 annual meeting (23-25 September), led by Nefel Tellioglu and Julian Carlin.

We asked participants if they had found any training materials that were particularly useful.

Mackrina said that she is using Python in her PhD project, but previously only had experience with Matlab.

Other participants chimed in with recommended resources and training needs:

"},{"location":"community/meetings/2024-08-08/#high-performance-computing-hpc","title":"High-performance computing (HPC)","text":"

Using GPGPUs for high-performance computing

Jiahao asked: Does anyone in our community have experience with using GPGPUs?

In response to Jaihao's question, Eamon replied that he has found it to be near-impossible, due to a combination of:

This initiated a broader discussion about improving the computational performance of our code and making use of high-performance computing (HPC) resources.

Computational performance was an issue that Nefel encountered when constructing an agent-based model of pneumococcal disease, and she found that code optimisation is a skill that takes time to learn.

We discussed several ways about using multiple CPU cores to make code run more quickly:

"},{"location":"community/meetings/2024-08-08/#debugging","title":"Debugging","text":"

Key message

There was strong interest in running a debugging workshop at the upcoming SPECTRUM 2024 annual meeting (23-25 September). As TK and Nefel have shown in their presentations, skills like debugging are extremely valuable for projects with tight deadlines, but these projects are also the worst time in which to develop and practice these skills.

Info

Attendees confirmed their willingness to evaluate and provide feedback on workshop draft materials.

Rob reflected that many people struggle to effectively debug their code, and can end up wasting a lot of time. Since we all make mistakes when writing code, this can be an extremely valuable skill. This is particularly true when working on, e.g., modelling contracts with government (see, e.g., the recent presentations from TK and Nefel).

We discussed some general guidelines, such as:

Eamon: by learning how to debug code, I substantially improved how I write and modularise my code. My functions became smaller, and this helped me to make fewer mistakes.

For example, David Price and I encountered a bug in some C++ code where the function was correct, but made assumptions about the input arguments. These assumptions were initially satisfied, but as other parts of the code were updated, these assumptions were no longer true.

To address this, I often write if statements at the top of a function to check these kinds of conditions, and stop if there are failures. You can see examples of this in real-world code from popular packages.

James: I'm happy to provide an example of debugging within an R pipe. Learn Debugging might be a useful resource for our community.

Rob: failing early is good, rather than producing output and then having to discover that it's incorrect (which may not be obvious). Related skills include learning how to read a stack trace, and defensive programming (such as checking input arguments, as Eamon mentioned).

TK: it's really hard to change existing habits. And I'm not doing any coding in my projects right now. My most recent coding experiences were in COVID-19 projects (see TK's presentation) and the very tight deadlines didn't allow me the opportunity to develop and apply new skills.

Rob: everyone already debugs and tests their code to some degree, simply by writing and evaluating code line by line (e.g., in an interactive R or Python session) and by running functions with example arguments to check that they give sensible outputs. We just need to \"nudge\" these behaviour to make it more systematic and reproducible.

"},{"location":"community/training/","title":"Training events","text":"

We will be running an Introduction to Debugging workshop at the SPECTRUM Annual Meeting 2024 (23-25 September).

"},{"location":"community/training/debugging/","title":"Introduction to Debugging","text":"

This workshop was prepared for the SPECTRUM Annual Meeting 2024 (23-25 September).

Tip

We all make mistakes when writing code and introduce errors.

Having good debugging skills means that you can spend less time fixing your code.

See the discussion in our August 2024 meeting for further background.

"},{"location":"community/training/debugging/building-your-skills/","title":"Building your skills","text":"

Tip

Whenever you debug some code, consider it as an opportunity to learn, reflect, and build your debugging skills.

Pay attention to your experience \u2014 what worked well, and what would you do differently next time?

"},{"location":"community/training/debugging/building-your-skills/#identifying-errors","title":"Identifying errors","text":"

Write a failing test case, this allows you to verify that the bug can be reproduced.

"},{"location":"community/training/debugging/building-your-skills/#developing-a-plan","title":"Developing a plan","text":"

What information might help you decide how to begin?

Can you identify a recent \"known good\" version of the code that doesn't include the error?

If you're using version control, have a look at your recent commits and check whether any of them are likely to have introduced or exposed this error.

"},{"location":"community/training/debugging/building-your-skills/#searching-for-the-root-cause","title":"Searching for the root cause","text":"

We've shown how a debugger allows you to pause your code and see what it's actually doing. This is extremely helpful!

Tip

Other approaches may be useful, but avoid using trial-and-error.

To quickly confirm or rule out specific suspicions, you might consider using:

"},{"location":"community/training/debugging/building-your-skills/#fixing-the-root-cause","title":"Fixing the root cause","text":"

Is there an optimal solution?

This might be the solution that changes as little code as possible, or it might be a solution that involves modifying and/or restructuring other parts of your code.

"},{"location":"community/training/debugging/building-your-skills/#after-its-fixed","title":"After it's fixed","text":"

If you didn't write a test case to identify the error (see above), now is the time to write a test case to ensure you don't even make the same error again.

Are there other parts of your code where you might make a similar mistake, for which you could also write test cases?

Are there coding practices that might make this kind of error easier to find next time? For example, this might involve dividing your code into smaller functions, using version control to record commits early and often.

Have you considered defensive programming practices? For example, at the start of a function it can often be a good idea to check that all of the arguments have valid values.

Are there tools or approaches that you haven't used before, and which might be worth trying next time?

"},{"location":"community/training/debugging/example-square-numbers/","title":"Example: Square numbers","text":"

Square numbers are positive integers that are equal to the square of an integer. Here we have provided example Python and R scripts that print all of the square numbers between 1 and 100:

You can download these scripts to run on your own computer:

Each script contains three functions:

The diagram below shows how main() calls find_squares(), which in turn calls is_square() many times.

sequenceDiagram\n    participant M as main()\n    participant F as find_squares()\n    participant I as is_square()\n    activate M\n    M ->>+ F: lower_bound = 1, upper_bound = 100\n    Note over F: squares = [ ]\n    F ->>+ I: value = 1\n    I ->>- F: True/False\n    F ->>+ I: value = 2\n    I ->>- F: True/False\n    F -->>+ I: ...\n    I -->>- F: ...\n    F ->>+ I: value = 100\n    I ->>- F: True/False\n    F ->>- M: squares = [...]\n    Note over M: print(squares)\n    deactivate M
Source code PythonR square_numbers.py
#!/usr/bin/env python3\n\"\"\"\nPrint the square numbers between 1 and 100.\n\"\"\"\n\n\ndef main():\n    squares = find_squares(1, 100)\n    print(squares)\n\n\ndef find_squares(lower_bound, upper_bound):\n    squares = []\n    value = lower_bound\n    while value <= upper_bound:\n        if is_square(value):\n            squares.append(value)\n        value += 1\n    return squares\n\n\ndef is_square(value):\n    for i in range(1, value + 1):\n        if i * i == value:\n            return True\n    return False\n\n\nif __name__ == '__main__':\n    main()\n
square_numbers.R
#!/usr/bin/env -S Rscript --vanilla\n#\n# Print the square numbers between 1 and 100.\n#\n\n\nmain <- function() {\n  squares <- find_squares(1, 100)\n  print(squares)\n}\n\n\nfind_squares <- function(lower_bound, upper_bound) {\n  squares <- c()\n  value <- lower_bound\n  while (value <= upper_bound) {\n    if (is_square(value)) {\n      squares <- c(squares, value)\n    }\n    value <- value + 1\n  }\n  squares\n}\n\n\nis_square <- function(value) {\n  for (i in seq(value)) {\n    if (i * i == value) {\n      return(TRUE)\n    }\n  }\n  FALSE\n}\n\nif (! interactive()) {\n  main()\n}\n
"},{"location":"community/training/debugging/example-square-numbers/#stepping-through-the-code","title":"Stepping through the code","text":"

These recorded terminal sessions demonstrate how to use Python and R debuggers from the command line. They cover:

Interactive debugger sessions

If your editor supports running a debugger, use this feature! See these examples for RStudio, PyCharm, Spyder, and VS Code.

Python debuggerR debugger

Video timeline:

  1. Set a breakpoint
  2. Show current location
  3. Step into is_square()
  4. Return from is_square()
  5. Show updated squares list
  6. Add a conditional breakpoint
  7. Stop at the conditional breakpoint
  8. Continue until the script ends

Video timeline:

  1. Set a breakpoint
  2. Step into is_square()
  3. Return from is_square()
  4. Show updated squares list
  5. Add a conditional breakpoint
  6. Stop at the conditional breakpoint
  7. Continue until the script ends
"},{"location":"community/training/debugging/exercise-perfect-numbers/","title":"Exercise: Perfect numbers","text":"OverviewPythonR

Perfect numbers are positive integers that are equal to the sum of their divisors. Here we have provided example Python and R scripts that should print all of the perfect numbers up to 1,000.

You can download each script to debug on your own computer:

perfect_numbers.py
#!/usr/bin/env python3\n\"\"\"\nThis script prints perfect numbers.\n\nPerfect numbers are positive integers that are equal to the sum of their\ndivisors.\n\"\"\"\n\n\ndef main():\n    start = 2\n    end = 1_000\n    for value in range(start, end + 1):\n        if is_perfect(value):\n            print(value)\n\n\ndef is_perfect(value):\n    divisors = divisors_of(value)\n    return sum(divisors) == value\n\n\ndef divisors_of(value):\n    divisors = []\n    candidate = 2\n    while candidate < value:\n        if value % candidate == 0:\n            divisors.append(candidate)\n        candidate += 1\n    return divisors\n\n\nif __name__ == '__main__':\n    main()\n
perfect_numbers.R
#!/usr/bin/env -S Rscript --vanilla\n#\n# This script prints perfect numbers.\n#\n# Perfect numbers are positive integers that are equal to the sum of their\n# divisors.\n#\n\n\nmain <- function() {\n  start <- 2\n  end <- 1000\n  for (value in seq.int(start, end)) {\n    if (is_perfect(value)) {\n      print(value)\n    }\n  }\n}\n\n\nis_perfect <- function(value) {\n  divisors <- divisors_of(value)\n  sum(divisors) == value\n}\n\n\ndivisors_of <- function(value) {\n  divisors <- c()\n  candidate <- 2\n  while (candidate < value) {\n    if (value %% candidate == 0) {\n      divisors <- c(divisors, candidate)\n    }\n    candidate <- candidate + 1\n  }\n  divisors\n}\n\n\nmain()\n

But there's a problem ...

If we run these scripts, we see that they don't print anything:

How should we begin investigating?

Interactive debugger sessions

If your editor supports running a debugger, use this feature! See these examples for RStudio, PyCharm, Spyder, and VS Code.

Some initial thoughts ... "},{"location":"community/training/debugging/exercise-python-vs-r/","title":"Exercise: Python vs R","text":"OverviewPythonR

Here we have provided SIR ODE model implementations in Python and in R. Each script runs several scenarios and produces a plot of infection prevalence for each scenario.

You can download each script to debug on your computer:

sir_ode.py
#!/usr/bin/env python3\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nfrom scipy.integrate import solve_ivp\n\n\ndef sir_rhs(time, state, popn, beta, gamma):\n    \"\"\"\n    The right-hand side for the vanilla SIR compartmental model.\n    \"\"\"\n    s_to_i = beta * state[1] * state[0] / popn  # beta * I(t) * S(t) / N\n    i_to_r = gamma * state[1]  # gamma * I(t)\n    return [-s_to_i, s_to_i - i_to_r, i_to_r]\n\n\ndef run_model(settings):\n    \"\"\"\n    Return the SIR ODE solution for the given model settings.\n    \"\"\"\n    # Define the time span and evaluation times.\n    sim_days = settings['sim_days']\n    time_span = [0, sim_days]\n    times = np.linspace(0, sim_days, num=sim_days + 1)\n    # Define the initial state.\n    popn = settings['population']\n    exposures = settings['exposures']\n    initial_state = [popn - exposures, exposures, 0]\n    # Define the SIR parameters.\n    params = (popn, settings['beta'], settings['gamma'])\n    # Return the daily number of people in S, I, and R.\n    return solve_ivp(\n        sir_rhs, time_span, initial_state, t_eval=times, args=params\n    )\n\n\ndef default_settings():\n    \"\"\"\n    The default model settings.\n    \"\"\"\n    return {\n        'sim_days': 20,\n        'population': 100,\n        'exposures': 2,\n        'beta': 1.0,\n        'gamma': 0.5,\n    }\n\n\ndef run_model_scaled_beta(settings, scale):\n    \"\"\"\n    Adjust the value of ``beta`` before running the model.\n    \"\"\"\n    settings['beta'] = scale * settings['beta']\n    return run_model(settings)\n\n\ndef run_model_scaled_gamma(settings, scale):\n    \"\"\"\n    Adjust the value of ``gamma`` before running the model.\n    \"\"\"\n    settings['gamma'] = scale * settings['gamma']\n    return run_model(settings)\n\n\ndef plot_prevalence_time_series(solutions):\n    \"\"\"\n    Plot daily prevalence of infectious individuals for one or more scenarios.\n    \"\"\"\n    fig, axes = plt.subplots(\n        constrained_layout=True,\n        nrows=len(solutions),\n        sharex=True,\n        sharey=True,\n    )\n    for ix, (scenario_name, solution) in enumerate(solutions.items()):\n        ax = axes[ix]\n        ax.title.set_text(scenario_name)\n        ax.plot(solution.y[1], label='I')\n        ax.set_xticks([0, 5, 10, 15, 20])\n    # Save the figure.\n    png_file = 'sir_ode_python.png'\n    fig.savefig(png_file, format='png', metadata={'Software': None})\n\n\ndef demonstration():\n    settings = default_settings()\n    default_scenario = run_model(settings)\n    scaled_beta_scenario = run_model_scaled_beta(settings, scale=1.5)\n    scaled_gamma_scenario = run_model_scaled_gamma(settings, scale=0.7)\n\n    plot_prevalence_time_series(\n        {\n            'Default': default_scenario,\n            'Scaled Beta': scaled_beta_scenario,\n            'Scaled Gamma': scaled_gamma_scenario,\n        }\n    )\n\n\nif __name__ == '__main__':\n    demonstration()\n
sir_ode.R
#!/usr/bin/env -S Rscript --vanilla\n\nlibrary(deSolve)\nsuppressPackageStartupMessages(library(dplyr))\nsuppressPackageStartupMessages(library(ggplot2))\n\n\n# The right-hand side for the vanilla SIR compartmental model.\nsir_rhs <- function(time, state, params) {\n  s_to_i <- params$beta * state[\"I\"] * state[\"S\"] / params$popn\n  i_to_r <- params$gamma * state[\"I\"]\n  list(c(-s_to_i, s_to_i - i_to_r, i_to_r))\n}\n\n\n# Return the SIR ODE solution for the given model settings.\nrun_model <- function(settings) {\n  # Define the time span and evaluation times.\n  times <- seq(0, settings$sim_days)\n  # Define the initial state.\n  popn <- settings$population\n  exposures <- settings$exposures\n  initial_state <- c(S = popn - exposures, I = exposures, R = 0)\n  # Define the SIR parameters.\n  params <- list(\n    popn = settings$population,\n    beta = settings$beta,\n    gamma = settings$gamma\n  )\n  # Return the daily number of people in S, I, and R.\n  ode(initial_state, times, sir_rhs, params)\n}\n\n\n# The default model settings.\ndefault_settings <- function() {\n  list(\n    sim_days = 20,\n    population = 100,\n    exposures = 2,\n    beta = 1.0,\n    gamma = 0.5\n  )\n}\n\n\n# Adjust the value of ``beta`` before running the model.\nrun_model_scaled_beta <- function(settings, scale) {\n  settings$beta <- scale  * settings$beta\n  run_model(settings)\n}\n\n\n# Adjust the value of ``gamma`` before running the model.\nrun_model_scaled_gamma <- function(settings, scale) {\n  settings$gamma <- scale * settings$gamma\n  run_model(settings)\n}\n\n\n# Plot daily prevalence of infectious individuals for one or more scenarios.\nplot_prevalence_time_series <- function(solutions) {\n  df <- lapply(\n    names(solutions),\n    function(name) {\n      solutions[[name]] |>\n        as.data.frame() |>\n        mutate(scenario = name)\n    }\n  ) |>\n    bind_rows() |>\n    mutate(\n      scenario = factor(scenario, levels = names(solutions), ordered = TRUE)\n    )\n  fig <- ggplot() +\n    geom_line(aes(time, I), df) +\n    xlab(NULL) +\n    scale_y_continuous(\n      name = NULL,\n      limits = c(0, 40),\n      breaks = c(0, 20, 40)\n    ) +\n    facet_wrap(~ scenario, ncol = 1) +\n    theme_bw(base_size = 10) +\n    theme(\n      strip.background = element_blank(),\n      panel.grid = element_blank(),\n    )\n  png_file <- \"sir_ode_r.png\"\n  ggsave(png_file, fig, width = 640, height = 480, units = \"px\", dpi = 150)\n}\n\n\ndemonstration <- function() {\n  settings <- default_settings()\n  default_scenario <- run_model(settings)\n  scaled_beta_scenario <- run_model_scaled_beta(settings, scale=1.5)\n  scaled_gamma_scenario <- run_model_scaled_gamma(settings, scale=0.7)\n\n  plot_prevalence_time_series(\n    list(\n      Default = default_scenario,\n      `Scaled Beta` = scaled_beta_scenario,\n      `Scaled Gamma` = scaled_gamma_scenario\n    )\n  )\n}\n\ndemonstration()\n

The model outputs differ!

Here are prevalence time-series plots produced by each script:

Python plotR plot

Model outputs for the Python script.

Model outputs for the R script.

Interactive debugger sessions

If your editor supports running a debugger, use this feature! See these examples for RStudio, PyCharm, Spyder, and VS Code.

Some initial thoughts ... "},{"location":"community/training/debugging/learning-objectives/","title":"Learning objectives","text":"

In this workshop, we will introduce the concept of \"debugging\", and demonstrate techniques and tools that can help us efficiently identify and remove errors from our code.

After completing this workshop, participants will:

Info

By achieving these learning objectives, participants should be able to find and correct errors in their code more quickly and with greater confidence.

"},{"location":"community/training/debugging/manifesto/","title":"Debugging manifesto","text":"Julia Evans and Tanya Brassie: Debugging Manifesto Poster, 2024.

Info

See the Resources page for links to more of Julia Evans' articles, stories, and zines about debugging.

"},{"location":"community/training/debugging/resources/","title":"Resources","text":""},{"location":"community/training/debugging/solutions/","title":"Exercise solutions","text":"

Info

Please don't look as these solutions until you have attempted the exercises.

Perfect numbers

Perfect numbers are equal to the sum of their proper divisors \u2014 all divisors except the number itself.

For example, 6 is a perfect number. Its proper divisors are 1, 2, and 3, and 1 + 2 + 3 = 6.

The mistake here is that the divisors_of() function only returns divisors greater than one, and so the code fails to identify any of the true perfect numbers.

Interestingly, this mistake did not result in the code mistakenly identifying any other numbers as perfect numbers.

Python vs R

If you're only familiar with one of these two languages, you may be surprised to discover that they have some fundamental differences. In this exercise we demonstrated one consequence of the ways that these languages handle function arguments.

The R Language Definition

The semantics of invoking a function in R argument are call-by-value. In general, supplied arguments behave as if they are local variables initialized with the value supplied and the name of the corresponding formal argument. Changing the value of a supplied argument within a function will not affect the value of the variable in the calling frame.

\u2014 Argument Evaluation

Python Programming FAQ

Remember that arguments are passed by assignment in Python. Since assignment just creates references to objects, there's no alias between an argument name in the caller and callee, and so no call-by-reference per se.

\u2014 How do I write a function with output parameters (call by reference)?

Answer

The value of \u03b2 is different in the third combination.

"},{"location":"community/training/debugging/understanding-error-messages/","title":"Understanding error messages","text":"

Tip

The visible error and its root cause may be located in very different parts of your code.

If there's an error in your code that causes the program to terminate, read the error message and see what it can tell you.

Most of the time, the error message should allow to identify:

"},{"location":"community/training/debugging/understanding-error-messages/#stack-traces","title":"Stack traces","text":"

When an error occurs, one useful piece of information is knowing which functions were called in order to make the error occur.

Below we have example Python and R scripts that produce an error.

Question

Can you identify where the error occurred, just by looking at the error message?

OverviewPythonR

You can download each script and run them on your own computer:

"},{"location":"community/training/debugging/understanding-error-messages/#the-error-message","title":"The error message","text":"
Traceback (most recent call last):\n  File \"stacktrace.py\", line 46, in <module>\n    status = main()\n  File \"stacktrace.py\", line 7, in main\n    do_big_tasks()\n  File \"stacktrace.py\", line 17, in do_big_tasks\n    do_third_step(i, quiet=quiet)\n  File \"stacktrace.py\", line 38, in do_third_step\n    try_something()\n  File \"stacktrace.py\", line 42, in try_something\n    raise ValueError(\"Whoops, this failed\")\nValueError: Whoops, this failed\n
Source code stacktrace.py
#!/usr/bin/env python3\n\nimport sys\n\n\ndef main():\n    do_big_tasks()\n    return 0\n\n\ndef do_big_tasks(num_tasks=20, quiet=True):\n    for i in range(num_tasks):\n        prepare_things(i, quiet=quiet)\n        do_first_step(i, quiet=quiet)\n        do_second_step(i, quiet=quiet)\n        if i > 15:\n            do_third_step(i, quiet=quiet)\n\n\ndef prepare_things(task_num, quiet=True):\n    if not quiet:\n        print(f'Preparing for task #{task_num}')\n\n\ndef do_first_step(task_num, quiet=True):\n    if not quiet:\n        print(f'Task #{task_num}: doing step #1')\n\n\ndef do_second_step(task_num, quiet=True):\n    if not quiet:\n        print(f'Task #{task_num}: doing step #2')\n\n\ndef do_third_step(task_num, quiet=True):\n    if not quiet:\n        print(f'Task #{task_num}: doing step #3')\n    try_something()\n\n\ndef try_something():\n    raise ValueError(\"Whoops, this failed\")\n\n\nif __name__ == \"__main__\":\n    status = main()\n    sys.exit(status)\n
"},{"location":"community/training/debugging/understanding-error-messages/#the-error-message_1","title":"The error message","text":"
Error in try_something() : Whoops, this failed\nCalls: main -> do_big_tasks -> do_third_step -> try_something\nBacktrace:\n    \u2586\n 1. \u2514\u2500global main()\n 2.   \u2514\u2500global do_big_tasks()\n 3.     \u2514\u2500global do_third_step(i, quiet = quiet)\n 4.       \u2514\u2500global try_something()\nExecution halted\n
Source code stacktrace.R
#!/usr/bin/env -S Rscript --vanilla\n\noptions(error = rlang::entrace)\n\n\nmain <- function() {\n  do_big_tasks()\n  invisible(0)\n}\n\ndo_big_tasks <- function(num_tasks = 20, quiet = TRUE) {\n  for (i in seq_len(num_tasks)) {\n    prepare_things(i, quiet = quiet)\n    do_first_step(i, quiet = quiet)\n    do_second_step(i, quiet = quiet)\n    if (i > 15) {\n      do_third_step(i, quiet = quiet)\n    }\n  }\n}\n\nprepare_things <- function(task_num, quiet = TRUE) {\n  if (!quiet) {\n    cat(\"Preparing for task #\", task_num, \"\\n\", sep = \"\")\n  }\n}\n\ndo_first_step <- function(task_num, quiet = TRUE) {\n  if (!quiet) {\n    cat(\"Task #\", task_num, \": doing step #1\\n\", sep = \"\")\n  }\n}\n\ndo_second_step <- function(task_num, quiet = TRUE) {\n  if (!quiet) {\n    cat(\"Task #\", task_num, \": doing step #2\\n\", sep = \"\")\n  }\n}\n\ndo_third_step <- function(task_num, quiet = TRUE) {\n  if (!quiet) {\n    cat(\"Task #\", task_num, \": doing step #3\\n\", sep = \"\")\n  }\n  try_something()\n}\n\ntry_something <- function() {\n  stop(\"Whoops, this failed\")\n}\n\nif (! interactive()) {\n  status <- main()\n  quit(status = status)\n}\n
"},{"location":"community/training/debugging/using-a-debugger/","title":"Using a debugger","text":"

The main features of a debugger are:

Slightly more advanced features include:

For example, consider the following code example:

PythonR
def first_function():\n    total = 0\n    for x in range(1, 50):\n        y = second_function(x)\n        total = total + y\n\n    return total\n\n\ndef second_function(a):\n    result = 3 * a**2 + 5 * a\n    return result\n\n\nfirst_function()\n
first_function <- function() {\n  total <- 0\n  for (x in seq(49)) {\n    y <- second_function(x)\n    total <- total + y\n  }\n  total\n}\n\nsecond_function <- function(a) {\n  result <- 3 * a^2 + 5 * a\n  result\n}\n\nfirst_function()\n
"},{"location":"community/training/debugging/what-is-debugging/","title":"What is debugging?","text":"

Info

Debugging is the process of identifying and removing errors from computer software.

You need to identify (and reproduce) the problem and only then begin fixing it (ideally writing a test case first, to check that (a) you can identify the problem; and (b) to identify if you accidentally introduce the same, or similar, mistake in the future).

"},{"location":"community/training/debugging/what-is-debugging/#action-1-identify-the-error","title":"Action 1: Identify the error","text":"

Tip

First make sure that you can reproduce the error.

What observations or outputs indicate the presence of this error?

Is the error reproducible, or does it come and go?

Write a failing test?

"},{"location":"community/training/debugging/what-is-debugging/#action-2-develop-a-plan","title":"Action 2: Develop a plan","text":"

Tip

The visible error and its root cause may be located in very different parts of your code.

Identify like and unlikely suspects, what can we rule in/out? What parts of your code recently changed? When was the last time you might have noticed this error?

"},{"location":"community/training/debugging/what-is-debugging/#action-3-search-for-the-root-cause","title":"Action 3: Search for the root cause","text":"

Tip

As much as possible, the search should be guided by facts, not assumptions.

Our assumptions about the code can help us to develop a plan, but we need to verify whether our assumptions are actually true.

For example:

Simple errors can often hide\nhide in plain sight and be\nsurprisingly difficult to\ndiscover without assistance.\n

Thinking \"this looks right\" is not a reliable indicator of whether a piece of code contains an error.

Searching at random is like looking for a needle in a haystack. (Perry McKenna, Flickr, 2009; CC BY 2.0)

Better approaches involve confirming what the code is actually doing.

"},{"location":"community/training/debugging/what-is-debugging/#action-4-fix-the-root-cause","title":"Action 4: Fix the root cause","text":"

Tip

It's worth considering if the root cause is a result of deliberate decisions or unintentional mistakes.

Don't start modifying/adding/removing lines based on suspicions or on the off chance that it might work. Without identifying the root cause of the error, there is no guarantee that making the error seem to disappear will actually have fixed the root cause.

"},{"location":"community/training/debugging/what-is-debugging/#action-5-after-its-fixed","title":"Action 5: After it's fixed","text":"

Tip

This is the perfect time to reflect on your experience!

What can you learn from this experience? Can you avoid this mistake in the future? What parts of the process were the hardest or took the longest? Are the tools and techniques that might help you next time?

"},{"location":"community/training/debugging/why-are-debuggers-useful/","title":"Why are debuggers useful?","text":"

Tip

A debugger is a tool for examining the state of a running program.

Debuggers are useful because they show us what the code is actually doing.

Many of the errors that take a long time for us to find are relatively simple once we find them.

We usually have a hard time finding these errors because:

  1. We read what we expect to see, rather than what is actually written; and

  2. We rely on assumptions about where the mistake might be, and our intuition is often wrong.

Here are some common mistakes that can be difficult to identify when reading through your own code:

"},{"location":"guides/","title":"Topical guides","text":"

These materials are divided into the following sections:

  1. Understanding version control, which provides you with a complete and annotated history of your work, and with powerful ways to search and examine this history.

  2. Learning to use Git, the most widely used version control system, which is the foundation of popular code-hosting services such as GitHub, GitLab, and Bitbucket.

  3. Using Git to collaborate with colleagues in a precisely controlled and manageable way.

  4. Learn how to structure your project so that it is easier for yourself and others to navigate.

  5. Learn how to write code so that it clearly expresses your intent and ideas.

  6. Ensuring that your research is reproducible by others.

  7. Using testing frameworks to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code.

  8. Running your code on various computing platforms that allow you to obtain results efficiently and without relying on your own laptop/computer.

"},{"location":"guides/learning-objectives/","title":"Learning objectives","text":"

This page defines the learning objectives for individual sections. These are skills that the reader should be able to demonstrate after reading through the relevant section, and completing any exercises in that section.

"},{"location":"guides/learning-objectives/#version-control-concepts","title":"Version control concepts","text":"

After completing this section, you should be able to identify how to apply version control concepts to your existing work. This includes being able to:

"},{"location":"guides/learning-objectives/#effective-use-of-git","title":"Effective use of git","text":"

After completing this section, you should be able to:

"},{"location":"guides/learning-objectives/#collaborating","title":"Collaborating","text":"

After completing this section, you should be able to:

"},{"location":"guides/learning-objectives/#project-structure","title":"Project structure","text":"

After completing this section, you should be able to:

"},{"location":"guides/learning-objectives/#writing-code","title":"Writing code","text":"

After completing this section, you should be able to:

"},{"location":"guides/prerequisites/","title":"Prerequisites","text":"

These materials assume that the reader has a basic knowledge of the Bash command-line shell and using SSH to connect to remote computers. You should be comfortable with using the command-line to perform the following tasks:

Please refer to the following materials for further details:

Info

If you use Windows, you may want to use PowerShell instead of Bash, in which case please refer to this Introduction to the Windows Command Line with Powershell.

Some chapters also assume that the reader has an account on GitHub and has added an SSH key to their account.

"},{"location":"guides/resources/","title":"Useful resources","text":""},{"location":"guides/resources/#education-and-commentary-articles","title":"Education and commentary articles","text":""},{"location":"guides/resources/#how-to-structure-your-project","title":"How to structure your project?","text":""},{"location":"guides/resources/#using-git-and-other-software-tools","title":"Using Git and other software tools","text":""},{"location":"guides/resources/#examples-of-making-models-publicly-available","title":"Examples of making models publicly available","text":""},{"location":"guides/resources/#performing-peer-code-review","title":"Performing peer code review","text":""},{"location":"guides/resources/#continuous-integration-ci-examples","title":"Continuous Integration (CI) examples","text":""},{"location":"guides/resources/#high-performance-computing-platforms","title":"High-performance computing platforms","text":""},{"location":"guides/resources/#how-to-acknowledge-and-cite-research-software","title":"How to acknowledge and cite research software","text":""},{"location":"guides/resources/#software-licensing","title":"Software licensing","text":""},{"location":"guides/collaborating/","title":"Collaborating","text":"

This section demonstrates how to use Git for collaborative research, enabling multiple people to work on the same code or paper in parallel. This includes deciding how to structure your repository, how to use branches for each collaborator, and how to use tags to track your progress.

Info

We also show how these skills support peer code review, so that you can share knowledge with, and learn from, your colleagues as part of your regular activity.

"},{"location":"guides/collaborating/an-example-pull-request/","title":"An example pull request","text":"

The initial draft of each chapter in this section were proposed in a pull request.

When this pull request was created, the branch added four new commits:

85594bf Add some guidelines for collaboration workflows\n678499b Discuss coding style guides\n2b9ff70 Discuss merge/pull requests and peer code review\n6cc6f54 Discuss repository structure and licenses\n

and the author (Rob Moss) asked the reviewer (Eamon Conway) to address several details in particular.

Eamon made several suggestions in their initial response, including:

In response, Rob pushed two commits that addressed the first two points above:

e1d1dd9 Move collaboration guidelines to the start\n3f78ef8 Move the repository structure and license chapters\n

and then wrote this chapter to show how we used a pull request to draft this book section.

"},{"location":"guides/collaborating/coding-style-guides/","title":"Coding style guides","text":"

A style guide defines rules and guidelines for how to structure and format your code. This can make code easier to write, because you don't need to worry about how to format your code. It can also make code easier to read, because consistent styling allows you to focus on the content.

There are two types of tools that can help you use a style guide:

"},{"location":"guides/collaborating/coding-style-guides/#recommended-style-guides","title":"Recommended style guides","text":"

Because programming languages can be very different from each other, style guides are usually defined for a single programming language.

Here we list some of the most widely-used style guides for several common programming languages:

"},{"location":"guides/collaborating/collaborating-on-a-paper/","title":"Collaborating on a paper","text":"

Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write papers collaboratively as a team. This approach is particularly useful if you are writing a paper in LaTeX.

Here are some general guidelines that you may find useful:

"},{"location":"guides/collaborating/collaborating-on-code/","title":"Collaborating on code","text":"

Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write code collaboratively as a team.

The precise workflow will depend on the nature of your research and on the collaborators in your team, but there are some general guidelines that you may find helpful:

"},{"location":"guides/collaborating/continuous-integration/","title":"Continuous integration","text":"

Continuous Integration (CI) is an automated process where code changes are merged in a central repository in order to run automated tests and other processes. This can provide rapid feedback while you develop your code and collaborate with others, as long as commits are regularly pushed to the central repository.

Info

This book is an example of Continuous Integration: every time a commit is pushed to the central repository, the online book is automatically updated.

Because the central repository is hosted on GitHub, we use GitHub Actions. Note that this is a GitHub-specific CI system. You can view the update action for this book here.

We also use CI to publish each pull request, so that contributions can be previewed during the review process. We added this feature in this pull request.

"},{"location":"guides/collaborating/merge-pull-requests/","title":"Merge/Pull requests","text":"

Recall that incorporating the changes from one branch into another branch is referred to as a \"merge\". You can merge one branch into another branch by taking the following steps:

  1. Checking out the branch you want to merge the changes into:

    git checkout -b my-branch\n
  2. Merging the changes from the other branch:

    git merge other-branch\n

Tip

It's a good idea to review these changes before you merge them.

If possible, it's even better to have someone else review the changes.

You can use git diff to view differences between branches. However, platforms such as GitHub and GitLab offer an easier approach: \"pull requests\" (also called \"merge requests\").

"},{"location":"guides/collaborating/merge-pull-requests/#creating-a-pull-request-on-github","title":"Creating a pull request on GitHub","text":"

The steps required to create a pull request differ depending on which platform you are using. Here, we will describe how to create a pull request on GitHub. For further details, see the GitHub documentation.

Once the pull request has been created, the reviewer(s) can review your changes and discuss their feedback and suggestions with you.

"},{"location":"guides/collaborating/merge-pull-requests/#merging-a-pull-request-on-github","title":"Merging a pull request on GitHub","text":"

When the pull request has been reviewed to your satisfaction, you can merge these changes by clicking the \"Merge pull request\" button.

Info

If the pull request has merge conflicts (e.g., if the branch you're merging into contains new commits), you will need to resolve these conflicts.

For further details about merging pull requests on GitHub, see the GitHub documentation.

"},{"location":"guides/collaborating/peer-code-review/","title":"Peer code review","text":"

Once you're comfortable in using merge/pull requests to review changes in a branch, you can use this approach for peer code review.

Info

Remember that code review is a discussion and critique of a person's work. The code author will naturally feel that they own the code, and the reviewer needs to respect this.

For further advice and suggestions on how to conduct peer code review, please see the Performing peer code review resources.

Tip

Mention people who have reviewed your code in the acknowledgements section of the paper.

"},{"location":"guides/collaborating/peer-code-review/#define-the-goals-of-a-peer-review","title":"Define the goals of a peer review","text":"

In creating a pull request and inviting someone to review your work, the pull request description should include the following details:

Tip

Make the reviewer's job easier by giving them small amounts of code to review.

"},{"location":"guides/collaborating/peer-code-review/#finding-a-reviewer","title":"Finding a reviewer","text":"

On GitHub we have started a peer-review team. We encourage you to post on the discussion board, to find like-minded members to review your code.

"},{"location":"guides/collaborating/peer-code-review/#guidelines-for-reviewing-other-peoples-code","title":"Guidelines for reviewing other people's code","text":"

Peer code review is an opportunity for the author and the reviewer to learn from each other and improve a piece of code.

Tip

The most important guideline for the reviewer is to be kind.

Treat other people's code the way you would want them to treat your code.

"},{"location":"guides/collaborating/peer-code-review/#complete-the-review","title":"Complete the review","text":"

Once the peer code review is complete, and any corresponding updates to the code have been made, you can merge the branch.

"},{"location":"guides/collaborating/peer-code-review/#retain-a-record-of-the-review","title":"Retain a record of the review","text":"

By using merge/pull requests to review code, the discussion between the author and the reviewer is recorded. This can be a useful reference for future code reviews.

Tip

Try to record all of the discussion in the pull request comments, even if the author and reviewer meet in person, so that you have a complete record of the review.

"},{"location":"guides/collaborating/sharing-a-branch/","title":"Sharing a branch","text":"

You might want a collaborator to work on a specific branch of your repository, so that you can keep their changes separate from your own work. Remember that you can merge commits from their branch into your own branches at any time.

Info

You need to ensure that your collaborator has access to the remote repository.

  1. Create a new branch for the collaborator, and give it a descriptive name.

    git checkout -b collab/jamie\n

    In this example we created a branch called \"collab/jamie\", where \"collab\" is a prefix used to identify branches intended for collaborators, and the collaborator is called Jamie.

    Remember that you can choose your own naming conventions.

  2. Push this branch to your remote repository:

    git push -u origin collab/jamie\n
  3. Your collaborator can then make a local copy of this branch:

    git clone --single-branch --branch collab/jamie repository-url\n
  4. They can then create commits and push them to your remote repository with git push.

"},{"location":"guides/collaborating/sharing-a-repository/","title":"Sharing a repository","text":"

The easiest way to share a repository with collaborators is to have a single remote repository that all collaborators can access. This repository could be located on a platform such as GitHub, GitLab, or Bitbucket, or on a platform provided by your University or Institute.

Theses platforms allow you to create public repositories and private repositories.

Info

You should decide whether a public repository or a private repository suits you best.

"},{"location":"guides/collaborating/sharing-a-repository/#giving-collaborators-access-to-your-remote-repository","title":"Giving collaborators access to your remote repository","text":"

The steps required to do this differ depending on which platform you are using. Here, we will describe how to give collaborators access to a repository on GitHub. For further details, see the GitHub documentation.