Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Project structure and Writing code sections #59

Merged
merged 2 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 7 additions & 3 deletions docs/guides/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ These materials are divided into the following sections:

3. Using Git to [collaborate with colleagues](./collaborating/README.md) in a precisely controlled and manageable way.

4. Ensuring that your research is [reproducible by others](./reproducibility/README.md).
4. Learn how to [structure your project](./project-structure/README.md) so that it is easier for yourself and others to navigate.

5. Using [testing frameworks](./testing/README.md) to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code.
5. Learn how to [write code](./writing-code/README.md) so that it clearly expresses your intent and ideas.

6. Running your code on various [computing platforms]() that allow you to obtain results efficiently and without relying on your own laptop/computer.
6. Ensuring that your research is [reproducible by others](./reproducibility/README.md).

7. Using [testing frameworks](./testing/README.md) to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code.

8. Running your code on various [computing platforms]() that allow you to obtain results efficiently and without relying on your own laptop/computer.
22 changes: 22 additions & 0 deletions docs/guides/learning-objectives.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,3 +51,25 @@ After completing [this section](collaborating/README.md), you should be able to:
- Use a pull request to **merge a collaborator's work** into your main branch; and

- **Conduct peer code review** in a respectful manner.

## Project structure

After completing [this section](project-structure/README.md), you should be able to:

- Understand how to structure a new project;

- Understand how to separate "what to do" from "how to do it"; and

- Structure your code to enable new experiments and analyses.

## Writing code

After completing [this section](writing-code/README.md), you should be able to:

- Divide your code into functions and modules;

- Ensure that your code is a clear expression of your ideas;

- Structure your code into reusable packages; and

- Take advantage of code formatters and code linters.
16 changes: 16 additions & 0 deletions docs/guides/project-structure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Project structure

How we choose to structure a project can affect how readily someone else — or even yourself, after a period of absence — can understand, use, and extend the work.

!!! question

Have you ever looked at your old code and wondered how it worked or how to make it run?

!!! tip

A good project structure can serve as a table of contents and help the reader to navigate.

In an earlier section we provided some guidelines for [how to structure a repository](../using-git/how-to-structure-a-repository.md).
In this section we present further guidelines and examples to help you choose a sensible structure for your current project and future projects.

This includes high-level recommendations that should apply to any project, and more detailed recommendations that may be specific to a particular type of project or choice of programming language.
25 changes: 25 additions & 0 deletions docs/guides/project-structure/automating-tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Automate common tasks

If you reach the point where you need to run a specific sequence of commands or actions to achieve something — e.g., running a model simulation, or producing an output figure — it is a **very good idea** to write a script that performs all of these actions correctly.

This is because while you may remember exactly what needs to be done **right now**, you may not remember next week, or next month, or next year.
We're all human, and we all make mistakes, but these kinds of mistakes are **easy to avoid**!

!!! info

Mistakes are a fact of life. It is the response to the error that counts.

— Nikki Giovanni

There are many tools that can help you to automate tasks, some of which are smart enough that they will only do as little as possible (e.g., avoid re-running steps if the inputs have not changed).

There are popular tools aimed at specific programming languages, such as:

- **R:** [targets](https://books.ropensci.org/targets/);

- **Python**: [nox](https://nox.thea.codes/) and [tox](https://tox.wiki/); and

- **Julia:** [pipelines](https://juliapackages.com/p/pipelines).

There are many generic automation tools (see, e.g., Wikipedia's list of [build automation software](https://en.wikipedia.org/wiki/List_of_build_automation_software)), although these can be rather complex to learn.
We recommend using a language-specific automation tool where possible, and only using a generic automation tool as a last resort.
15 changes: 15 additions & 0 deletions docs/guides/project-structure/exercise-a-good-readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Exercise: a good README

Remember that the README file (usually one of `README.md`, `README.rst`, or `README.txt`) is often **the very first thing** that a user will see when looking at your project.

- Have you seen any README files that were particularly helpful, or were not very helpful?

- What information do you find helpful in a README file?

Consider the `README.md` file in the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts).

- What content, if any, would you **add** to this file?

- What content, if any, would you **remove** from this file?

- Would you change its structure in any way?
9 changes: 9 additions & 0 deletions docs/guides/project-structure/exercise-what-works-for-you.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Exercise: what works for you?

Look back at your past projects and identify aspects of their structure that you have found helpful.

- What features or choices have worked well in past projects and might help you structure your future projects?

- What problems or issues have you experienced with the structure of your past projects, which you could avoid in your future projects?

- Can any of your colleagues and collaborators share similar insights?
46 changes: 46 additions & 0 deletions docs/guides/project-structure/explain-how-it-works.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Explain how it all works

Once you've chosen a project structure, you need to write down **how it all works** — regardless of how simple and clear your project structure is!

!!! tip

The best place to do this is in a `README.md` file (or equivalent) in the project root directory.

Begin with an overview of the project:

- What question(s) are you trying to address?

- What data, hypotheses, methods, etc, are you using?

- What outputs does this generate?

You can then provide further detail, such as:

- What software environment and/or packages must be available for your code to run?

- How can the user generate each of the outputs?

- What license [have you chosen](../using-git/choosing-a-license.md)?


## An example README.md

See the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts) for an example `README.md` file.

This repository was used to generate the results, tables, and figures presented in the paper "[Forecasting COVID-19 activity in Australia to support pandemic response: May to October 2020](https://doi.org/10.1038/s41598-023-35668-6)", *Scientific Reports* 13, 8763 (2023).

**Strengths:**

- It includes installation and usage instructions;

- It identifies the paper; and

- It identifies the license under which the code is distributed.

**Weaknesses:**

- It only explains some of the project structure.

- It doesn't provide an overview of the project, it only links to the paper.

- The root directory contains a number of scripts and input files that aren't described.
70 changes: 70 additions & 0 deletions docs/guides/project-structure/workflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Define your workflow

A good first step in deciding how to structure a project is to ask yourself:

- What are the different project phases?

- What are the major activities in each phase?

## An example of phases and activities

For example, a project might involve the following phases:

1. Clean an existing data set;

2. Build models with different hypotheses or features;

3. Fit each model to the data; and

4. Decide which model best explains the data.

The data-cleaning phase might involve the following activities:

- Obtain the raw data;

- Identify the quality checks that should be applied;

- Decide how to resolve data that fail each quality check; and

- Generate and record the cleaned data.

The model-building phase might involve the following activities:

- Perform a literature search to identify relevant modelling studies;

- Identify competing hypotheses or features that might explain the data;

- Design a model that implements each hypothesis; and

- Define the relationship between each model and the cleaned data.

## Reflect this workflow in your project structure

You can use the phases and activities to guide your choice of directory structure.
For this example project, one possible structure is:

- `project/`: the root directory of your project

- `input/`: a sub-directory that contains input data;

- `raw/`: the raw data **before** cleaning;

- `cleaned/`: the cleaned data;

- `code/`: a sub-directory that contains the project code;

- `cleaning/`: the data cleaning code;

- `model-first-hypothesis/`: the first model;

- `model-second-hypothesis/`: the second model;

- `fitting/`: the code that fits each model to the data;

- `evaluation/`: the code the compares the model fits;

- `plotting/`: the code that plots output figures;

- `paper/`: a sub-directory for the project manuscript;

- `figures/`: the output figures;
13 changes: 13 additions & 0 deletions docs/guides/writing-code/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Writing code

For computational research, code is an important scientific artefact for the author, for colleagues and collaborators, and for the scientific community.
It is the **ultimate form** of expressing **what you did** and **how you did it**.
With good version control and documentation practices, it can also capture **when and why** you made important decisions.

!!! tip

[W]e want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for **expressing ideas about methodology**.
Thus, programs must be **written for people to read**, and only incidentally for machines to execute.

— [Structure and Interpretation of Computer Programs](https://mitpress.mit.edu/9780262510875/).
Abelson, Sussman, and Sussman, 1984.
42 changes: 42 additions & 0 deletions docs/guides/writing-code/behave-nicely.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Behave nicely

Would you feel comfortable running someone else's code if you thought it might affect your other files, applications, settings, or do something else that's unexpected?

!!! tip

Your code should be **encapsulated:** it should assume as little as possible about the computer on which it is running, and it shouldn't mess with the user's environment.

!!! tip

Your code should follow the **principal of least surprise:** behave in a way that most users will expect it to behave, and not astonish or surprise them.

## A cake analogy

Suppose you have two colleagues who regularly bake cakes, and you decide you'd like one of them to bake you a chocolate cake.

- **A nice colleague:**

- That evening, they go home and bake a cake.
- They bring the cake to work the next day.
- The cake tastes of chocolate.

- **A messy colleague:**

- They bring the ingredients and a portable oven into your office.
- They make a huge mess, splattering your desk and computer.
- The oven is noisy and makes the office uncomfortably warm.
- The cake tastes of vanilla, not chocolate.

## Some specific tips

- Avoid modifying files outside of the project directory!

- Avoid using hard-coded absolute paths, such as `C:\Users\My Name\Some Project\...` or `/Users/My Name/Some other directory`.
These make it harder for other people to use the code, or to run the code on high-performance computing platforms.

- Prefer using paths that are relative to the root directory of your project, such as `input-data/case-data/cases-for-2023.csv`.
If you're using R, the [here](https://here.r-lib.org/) package is extremely helpful.

- Warn the user before running tasks that take a long time to complete.

- Notify the user before downloading large files.
14 changes: 14 additions & 0 deletions docs/guides/writing-code/check-your-code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Check your code

A "linter" is a tool that checks your code for syntax errors, possible mistakes, inconsistent formatting, and other potential issues.

We **strongly recommend** using an editor that displays linter warnings as you write your code.
Having instant feedback allows you to rapidly resolve many common issues and substantially improve your code.

We list here some of the most commonly used linters:

- **R:** [lintr](https://lintr.r-lib.org/)

- **Python:** [ruff](https://docs.astral.sh/ruff/)

- **Julia:** [Lint.jl](https://lintjl.readthedocs.org/en/stable/)
30 changes: 30 additions & 0 deletions docs/guides/writing-code/coding-advice.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Coding advice

- Think about how to cleanly structure your code.
Take a **similar approach to how we write papers and grants**.

- Break the overall problem into pieces, and then decide how to structure each piece in turn.

- Divide your code into functions that each do one "thing", and group related functions into separate files or modules.

- It can sometimes help to think about how you want the final code to look, and then design the functions and components that are needed.

- Avoid global variables, aim to pass everything as function arguments.
This makes the code more robust and easier to run.

- Avoid passing lots of individual parameters as separate arguments, this is prone to error — you might not pass them in the correct order.
Instead, collect the parameters into a single structure (e.g, a Python dictionary, an R named list).

- Avoid making multiple copies of a model if you want to change some aspect of its behaviour.
Instead, add a new model parameter that enables/disables this new behaviour.
This allows you to use the same code to run the older and newer versions of the model.

- Try to collect common or related tasks into a single script, and allow the user to select which task(s) to run, rather than creating many scripts that perform very similar tasks.

- Write test cases to check key model properties.

- You want to identify problems and mistakes as soon as possible!

- Thinking about how to make your code testable can help you improve its structure!

- Well-written tests can also demonstrate **how to use your code**!
58 changes: 58 additions & 0 deletions docs/guides/writing-code/cohesion-coupling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Cohesion and coupling

**Divide your code** into modules, each of which does one thing ("high cohesion") and depends as little as possible on other pieces ("low coupling").

## Common project components

For example, an infectious diseases modelling project might often be divided into some of the following components:

- The model parameters — what are their values or prior distributions?

- The initial model state — how is this created from the model parameters?

- The model equations or update rules — how does the model evolve over time?

- Summary statistics — what do you want to record for each simulation?
This might be the entire state history, a subset of the history, some aggregate statistics, or any combination of these things.

- The input data (if any) — these may be case data, serological data, within-host specimen counts, etc.

- The relationship between data and the model state ("observation model").

- Simulated data generated from a model simulation.

As much as possible, each of these components (where relevant to your project) should be represented as **a separate piece of code**.

## Separating the "what" from the "how"

Dividing your code into separate components is especially important if you want to use a model for multiple purposes, such as:

- Exploring different scenarios;
- Fitting to various data sets;
- Performing sensitivity and uncertainty analyses; and
- Forecasting future data.

!!! tip

In particular, keep the following aspects of your project separate:

- **What to do:** fitting to different data sets, exploring different scenarios, performing a sensitivity analysis, etc; and

- **How to do it:** the model implementation.

If you want to explore a range of model scenarios, for example, define the parameter values (or sampling distributions) for each scenario in a separate input file.
Then write a script that takes an input file name as an argument, reads the parameter values, and uses these values to run the model simulations.

This makes it extremely simple to define and run new scenarios without modifying your code.

## Interactions between components

Choosing how your components interact (e.g., by calling functions or passing data) is **just as important** as deciding how to divide your code into components.

Here are some key recommendations from [Object-Oriented Software Construction (2nd ed)](https://bertrandmeyer.com/OOSC2/):

- Small interfaces: if two modules communicate, they should exchange as little information as possible.

- Explicit interfaces: if two modules communicate, it should be obvious from the code in one or both of these modules.

- Self documentation: strive to make all information about a module part of the module itself.
Loading
Loading