-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add Project structure and Writing code sections
I started sketching out content for a project structure section, but this quickly branched out to include a variety of topics that are better described as writing code. These sections include several exercises. One uses my Australian 2020 COVID-19 forecasts repository as an example and asks the reader to think about how the README.md file could be improved.
- Loading branch information
Showing
19 changed files
with
609 additions
and
6 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
# Project structure | ||
|
||
How we choose to structure a project can affect how readily someone else — or even yourself, after a period of absence — can understand, use, and extend the work. | ||
|
||
!!! question | ||
|
||
Have you ever looked at your old code and wondered how it worked or how to make it run? | ||
|
||
!!! tip | ||
|
||
A good project structure can serve as a table of contents and help the reader to navigate. | ||
|
||
In an earlier section we provided some guidelines for [how to structure a repository](../using-git/how-to-structure-a-repository.md). | ||
In this section we present further guidelines and examples to help you choose a sensible structure for your current project and future projects. | ||
|
||
This includes high-level recommendations that should apply to any project, and more detailed recommendations that may be specific to a particular type of project or choice of programming language. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
# Automate common tasks | ||
|
||
If you reach the point where you need to run a specific sequence of commands or actions to achieve something — e.g., running a model simulation, or producing an output figure — it is a **very good idea** to write a script that performs all of these actions correctly. | ||
|
||
This is because while you may remember exactly what needs to be done **right now**, you may not remember next week, or next month, or next year. | ||
We're all human, and we all make mistakes, but these kinds of mistakes are **easy to avoid**! | ||
|
||
!!! info | ||
|
||
Mistakes are a fact of life. It is the response to the error that counts. | ||
|
||
— Nikki Giovanni | ||
|
||
There are many tools that can help you to automate tasks, some of which are smart enough that they will only do as little as possible (e.g., avoid re-running steps if the inputs have not changed). | ||
|
||
There are popular tools aimed at specific programming languages, such as: | ||
|
||
- **R:** [targets](https://books.ropensci.org/targets/); | ||
|
||
- **Python**: [nox](https://nox.thea.codes/) and [tox](https://tox.wiki/); and | ||
|
||
- **Julia:** [pipelines](https://juliapackages.com/p/pipelines). | ||
|
||
There are many generic automation tools (see, e.g., Wikipedia's list of [build automation software](https://en.wikipedia.org/wiki/List_of_build_automation_software)), although these can be rather complex to learn. | ||
We recommend using a language-specific automation tool where possible, and only using a generic automation tool as a last resort. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# Exercise: a good README | ||
|
||
Remember that the README file (usually one of `README.md`, `README.rst`, or `README.txt`) is often **the very first thing** that a user will see when looking at your project. | ||
|
||
- Have you seen any README files that were particularly helpful, or were not very helpful? | ||
|
||
- What information do you find helpful in a README file? | ||
|
||
Consider the `README.md` file in the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts). | ||
|
||
- What content, if any, would you **add** to this file? | ||
|
||
- What content, if any, would you **remove** from this file? | ||
|
||
- Would you change its structure in any way? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Exercise: what works for you? | ||
|
||
Look back at your past projects and identify aspects of their structure that you have found helpful. | ||
|
||
- What features or choices have worked well in past projects and might help you structure your future projects? | ||
|
||
- What problems or issues have you experienced with the structure of your past projects, which you could avoid in your future projects? | ||
|
||
- Can any of your colleagues and collaborators share similar insights? |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
# Explain how it all works | ||
|
||
Once you've chosen a project structure, you need to write down **how it all works** — regardless of how simple and clear your project structure is! | ||
|
||
!!! tip | ||
|
||
The best place to do this is in a `README.md` file (or equivalent) in the project root directory. | ||
|
||
Begin with an overview of the project: | ||
|
||
- What question(s) are you trying to address? | ||
|
||
- What data, hypotheses, methods, etc, are you using? | ||
|
||
- What outputs does this generate? | ||
|
||
You can then provide further detail, such as: | ||
|
||
- What software environment and/or packages must be available for your code to run? | ||
|
||
- How can the user generate each of the outputs? | ||
|
||
- What license [have you chosen](../using-git/choosing-a-license.md)? | ||
|
||
|
||
## An example README.md | ||
|
||
See the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts) for an example `README.md` file. | ||
|
||
This repository was used to generate the results, tables, and figures presented in the paper "[Forecasting COVID-19 activity in Australia to support pandemic response: May to October 2020](https://doi.org/10.1038/s41598-023-35668-6)", *Scientific Reports* 13, 8763 (2023). | ||
|
||
**Strengths:** | ||
|
||
- It includes installation and usage instructions; | ||
|
||
- It identifies the paper; and | ||
|
||
- It identifies the license under which the code is distributed. | ||
|
||
**Weaknesses:** | ||
|
||
- It only explains some of the project structure. | ||
|
||
- It doesn't provide an overview of the project, it only links to the paper. | ||
|
||
- The root directory contains a number of scripts and input files that aren't described. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
# Define your workflow | ||
|
||
A good first step in deciding how to structure a project is to ask yourself: | ||
|
||
- What are the different project phases? | ||
|
||
- What are the major activities in each phase? | ||
|
||
## An example of phases and activities | ||
|
||
For example, a project might involve the following phases: | ||
|
||
1. Clean an existing data set; | ||
|
||
2. Build models with different hypotheses or features; | ||
|
||
3. Fit each model to the data; and | ||
|
||
4. Decide which model best explains the data. | ||
|
||
The data-cleaning phase might involve the following activities: | ||
|
||
- Obtain the raw data; | ||
|
||
- Identify the quality checks that should be applied; | ||
|
||
- Decide how to resolve data that fail each quality check; and | ||
|
||
- Generate and record the cleaned data. | ||
|
||
The model-building phase might involve the following activities: | ||
|
||
- Perform a literature search to identify relevant modelling studies; | ||
|
||
- Identify competing hypotheses or features that might explain the data; | ||
|
||
- Design a model that implements each hypothesis; and | ||
|
||
- Define the relationship between each model and the cleaned data. | ||
|
||
## Reflect this workflow in your project structure | ||
|
||
You can use the phases and activities to guide your choice of directory structure. | ||
For this example project, one possible structure is: | ||
|
||
- `project/`: the root directory of your project | ||
|
||
- `input/`: a sub-directory that contains input data; | ||
|
||
- `raw/`: the raw data **before** cleaning; | ||
|
||
- `cleaned/`: the cleaned data; | ||
|
||
- `code/`: a sub-directory that contains the project code; | ||
|
||
- `cleaning/`: the data cleaning code; | ||
|
||
- `model-first-hypothesis/`: the first model; | ||
|
||
- `model-second-hypothesis/`: the second model; | ||
|
||
- `fitting/`: the code that fits each model to the data; | ||
|
||
- `evaluation/`: the code the compares the model fits; | ||
|
||
- `plotting/`: the code that plots output figures; | ||
|
||
- `paper/`: a sub-directory for the project manuscript; | ||
|
||
- `figures/`: the output figures; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Writing code | ||
|
||
For computational research, code is an important scientific artefact for the author, for colleagues and collaborators, and for the scientific community. | ||
It is the **ultimate form** of expressing **what you did** and **how you did it**. | ||
With good version control and documentation practices, it can also capture **when and why** you made important decisions. | ||
|
||
!!! tip | ||
|
||
[W]e want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for **expressing ideas about methodology**. | ||
Thus, programs must be **written for people to read**, and only incidentally for machines to execute. | ||
|
||
— [Structure and Interpretation of Computer Programs](https://mitpress.mit.edu/9780262510875/). | ||
Abelson, Sussman, and Sussman, 1984. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# Behave nicely | ||
|
||
Would you feel comfortable running someone else's code if you thought it might affect your other files, applications, settings, or do something else that's unexpected? | ||
|
||
!!! tip | ||
|
||
Your code should be **encapsulated:** it should assume as little as possible about the computer on which it is running, and it shouldn't mess with the user's environment. | ||
|
||
!!! tip | ||
|
||
Your code should follow the **principal of least surprise:** behave in a way that most users will expect it to behave, and not astonish or surprise them. | ||
|
||
## A cake analogy | ||
|
||
Suppose you have two colleagues who regularly bake cakes, and you decide you'd like one of them to bake you a lemon cake with chocolate icing. | ||
|
||
- **A nice colleague:** you ask your colleague to bake a lemon cake with chocolate icing. | ||
|
||
- That evening, they go home and bake a cake. | ||
- They bring the cake to work the next day. | ||
- The cake tastes of lemon and is topped with chocolate icing. | ||
|
||
- **A messy colleague:** you ask your colleague to bake a lemon cake with chocolate icing. | ||
|
||
- They reply that they will make a cake | ||
- The next day, they come into your office with the ingredients and a portable oven. | ||
- They begin mixing ingredients, making a huge mess on your desk. | ||
- You have to wait until the batter is mixed before they ask you for your choice of flavour. | ||
- They don't have lemons, but add some orange zest to the batter. | ||
- Once the cake is baked, they let it cool. | ||
- One hour later they ask you what flavour icing you want. | ||
- They don't have chocolate or cocoa, so they a different icing. | ||
- They give you the cake. | ||
- The cake tastes of orange and is topped with strawberry icing. | ||
- Your office is covered in flour, sugar, and cake batter. | ||
|
||
## Some specific tips | ||
|
||
- Avoid modifying files outside of the project directory! | ||
|
||
- Avoid using hard-coded absolute paths, such as `C:\Users\My Name\Some Project\...` or `/Users/My Name/Some other directory`. | ||
These make it harder for other people to use the code, or to run the code on high-performance computing platforms. | ||
|
||
- Prefer using paths that are relative to the root directory of your project, such as `input-data/case-data/cases-for-2023.csv`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
# Check your code | ||
|
||
A "linter" is a tool that checks your code for syntax errors, possible mistakes, inconsistent formatting, and other potential issues. | ||
|
||
We **strongly recommend** using an editor that displays linter warnings as you write your code. | ||
Having instant feedback allows you to rapidly resolve many common issues and substantially improve your code. | ||
|
||
We list here some of the most commonly used linters: | ||
|
||
- **R:** [lintr](https://lintr.r-lib.org/) | ||
|
||
- **Python:** [ruff](https://docs.astral.sh/ruff/) | ||
|
||
- **Julia:** [Lint.jl](https://lintjl.readthedocs.org/en/stable/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
# Coding advice | ||
|
||
- Think about how to cleanly structure your code. | ||
Take a **similar approach to how we write papers and grants**. | ||
|
||
- Break the overall problem into pieces, and then decide how to structure each piece in turn. | ||
|
||
- Divide your code into functions that each do one "thing", and group related functions into separate files or modules. | ||
|
||
- It can sometimes help to think about how you want the final code to look, and then design the functions and components that are needed. | ||
|
||
- Avoid global variables, aim to pass everything as function arguments. | ||
This makes the code more robust and easier to run. | ||
|
||
- Avoid passing lots of individual parameters as separate arguments, this is prone to error — you might not pass them in the correct order. | ||
Instead, collect the parameters into a single structure (e.g, a Python dictionary, an R named list). | ||
|
||
- Avoid making multiple copies of a model if you want to change some aspect of its behaviour. | ||
Instead, add a new model parameter that enables/disables this new behaviour. | ||
This allows you to use the same code to run the older and newer versions of the model. | ||
|
||
- Try to collect common or related tasks into a single script, and allow the user to select which task(s) to run, rather than creating many scripts that perform very similar tasks. | ||
|
||
- Write test cases to check key model properties. | ||
|
||
- You want to identify problems and mistakes as soon as possible! | ||
|
||
- Thinking about how to make your code testable can help you improve its structure! | ||
|
||
- Well-written tests can also demonstrate **how to use your code**! |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# Cohesion and coupling | ||
|
||
**Divide your code** into modules, each of which does one thing ("high cohesion") and depends as little as possible on other pieces ("low coupling"). | ||
|
||
## Common project components | ||
|
||
For example, an infectious diseases modelling project might often be divided into some of the following components: | ||
|
||
- The model parameters — what are their values or prior distributions? | ||
|
||
- The initial model state — how is this created from the model parameters? | ||
|
||
- The model equations or update rules — how does the model evolve over time? | ||
|
||
- Summary statistics — what do you want to record for each simulation? | ||
This might be the entire state history, a subset of the history, some aggregate statistics, or any combination of these things. | ||
|
||
- The input data (if any) — these may be case data, serological data, within-host specimen counts, etc. | ||
|
||
- The relationship between data and the model state ("observation model"). | ||
|
||
- Simulated data generated from a model simulation. | ||
|
||
As much as possible, each of these components (where relevant to your project) should be represented as **a separate piece of code**. | ||
|
||
## Separating the "what" from the "how" | ||
|
||
Dividing your code into separate components is especially important if you want to use a model for multiple purposes, such as: | ||
|
||
- Exploring different scenarios; | ||
- Fitting to various data sets; | ||
- Performing sensitivity and uncertainty analyses; and | ||
- Forecasting future data. | ||
|
||
!!! tip | ||
|
||
In particular, keep the following aspects of your project separate: | ||
|
||
- **What to do:** fitting to different data sets, exploring different scenarios, performing a sensitivity analysis, etc; and | ||
|
||
- **How to do it:** the model implementation. | ||
|
||
If you want to explore a range of model scenarios, for example, define the parameter values (or sampling distributions) for each scenario in a separate input file. | ||
Then write a script that takes an input file name as an argument, reads the parameter values, and uses these values to run the model simulations. | ||
|
||
This makes it extremely simple to define and run new scenarios without modifying your code. | ||
|
||
## Interactions between components | ||
|
||
Choosing how your components interact (e.g., by calling functions or passing data) is **just as important** as deciding how to divide your code into components. | ||
|
||
Here are some key recommendations from [Object-Oriented Software Construction (2nd ed)](https://bertrandmeyer.com/OOSC2/): | ||
|
||
- Small interfaces: if two modules communicate, they should exchange as little information as possible. | ||
|
||
- Explicit interfaces: if two modules communicate, it should be obvious from the code in one or both of these modules. | ||
|
||
- Self documentation: strive to make all information about a module part of the module itself. |
Oops, something went wrong.