From 9b1a7207570fcd52c2e2cfb47cd40fbbb8361f91 Mon Sep 17 00:00:00 2001 From: Rob Moss Date: Thu, 11 Apr 2024 11:11:47 +1000 Subject: [PATCH 1/2] Add Project structure and Writing code sections I started sketching out content for a project structure section, but this quickly branched out to include a variety of topics that are better described as writing code. These sections include several exercises. One uses my Australian 2020 COVID-19 forecasts repository as an example and asks the reader to think about how the README.md file could be improved. --- docs/guides/README.md | 10 ++- docs/guides/learning-objectives.md | 22 ++++++ docs/guides/project-structure/README.md | 16 ++++ .../project-structure/automating-tasks.md | 25 +++++++ .../exercise-a-good-readme.md | 15 ++++ .../exercise-what-works-for-you.md | 9 +++ .../project-structure/explain-how-it-works.md | 46 ++++++++++++ docs/guides/project-structure/workflow.md | 70 ++++++++++++++++++ docs/guides/writing-code/README.md | 13 ++++ docs/guides/writing-code/behave-nicely.md | 44 +++++++++++ docs/guides/writing-code/check-your-code.md | 14 ++++ docs/guides/writing-code/coding-advice.md | 30 ++++++++ docs/guides/writing-code/cohesion-coupling.md | 58 +++++++++++++++ docs/guides/writing-code/create-packages.md | 54 ++++++++++++++ .../guides/writing-code/document-your-code.md | 59 +++++++++++++++ .../writing-code/exercise-seek-feedback.md | 8 ++ docs/guides/writing-code/format-your-code.md | 24 ++++++ .../how-we-learn-to-write-code.md | 74 +++++++++++++++++++ mkdocs.yml | 24 +++++- 19 files changed, 609 insertions(+), 6 deletions(-) create mode 100644 docs/guides/project-structure/README.md create mode 100644 docs/guides/project-structure/automating-tasks.md create mode 100644 docs/guides/project-structure/exercise-a-good-readme.md create mode 100644 docs/guides/project-structure/exercise-what-works-for-you.md create mode 100644 docs/guides/project-structure/explain-how-it-works.md create mode 100644 docs/guides/project-structure/workflow.md create mode 100644 docs/guides/writing-code/README.md create mode 100644 docs/guides/writing-code/behave-nicely.md create mode 100644 docs/guides/writing-code/check-your-code.md create mode 100644 docs/guides/writing-code/coding-advice.md create mode 100644 docs/guides/writing-code/cohesion-coupling.md create mode 100644 docs/guides/writing-code/create-packages.md create mode 100644 docs/guides/writing-code/document-your-code.md create mode 100644 docs/guides/writing-code/exercise-seek-feedback.md create mode 100644 docs/guides/writing-code/format-your-code.md create mode 100644 docs/guides/writing-code/how-we-learn-to-write-code.md diff --git a/docs/guides/README.md b/docs/guides/README.md index 78f51ec6..ac0d5586 100644 --- a/docs/guides/README.md +++ b/docs/guides/README.md @@ -8,8 +8,12 @@ These materials are divided into the following sections: 3. Using Git to [collaborate with colleagues](./collaborating/README.md) in a precisely controlled and manageable way. -4. Ensuring that your research is [reproducible by others](./reproducibility/README.md). +4. Learn how to [structure your project](./project-structure/README.md) so that it is easier for yourself and others to navigate. -5. Using [testing frameworks](./testing/README.md) to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code. +5. Learn how to [write code](./writing-code/README.md) so that it clearly expresses your intent and ideas. -6. Running your code on various [computing platforms]() that allow you to obtain results efficiently and without relying on your own laptop/computer. +6. Ensuring that your research is [reproducible by others](./reproducibility/README.md). + +7. Using [testing frameworks](./testing/README.md) to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code. + +8. Running your code on various [computing platforms]() that allow you to obtain results efficiently and without relying on your own laptop/computer. diff --git a/docs/guides/learning-objectives.md b/docs/guides/learning-objectives.md index 7f36c920..0acfed6d 100644 --- a/docs/guides/learning-objectives.md +++ b/docs/guides/learning-objectives.md @@ -51,3 +51,25 @@ After completing [this section](collaborating/README.md), you should be able to: - Use a pull request to **merge a collaborator's work** into your main branch; and - **Conduct peer code review** in a respectful manner. + +## Project structure + +After completing [this section](project-structure/README.md), you should be able to: + +- Understand how to structure a new project; + +- Understand how to separate "what to do" from "how to do it"; and + +- Structure your code to enable new experiments and analyses. + +## Writing code + +After completing [this section](writing-code/README.md), you should be able to: + +- Divide your code into functions and modules; + +- Ensure that your code is a clear expression of your ideas; + +- Structure your code into reusable packages; and + +- Take advantage of code formatters and code linters. diff --git a/docs/guides/project-structure/README.md b/docs/guides/project-structure/README.md new file mode 100644 index 00000000..5e4df3a2 --- /dev/null +++ b/docs/guides/project-structure/README.md @@ -0,0 +1,16 @@ +# Project structure + +How we choose to structure a project can affect how readily someone else — or even yourself, after a period of absence — can understand, use, and extend the work. + +!!! question + + Have you ever looked at your old code and wondered how it worked or how to make it run? + +!!! tip + + A good project structure can serve as a table of contents and help the reader to navigate. + +In an earlier section we provided some guidelines for [how to structure a repository](../using-git/how-to-structure-a-repository.md). +In this section we present further guidelines and examples to help you choose a sensible structure for your current project and future projects. + +This includes high-level recommendations that should apply to any project, and more detailed recommendations that may be specific to a particular type of project or choice of programming language. diff --git a/docs/guides/project-structure/automating-tasks.md b/docs/guides/project-structure/automating-tasks.md new file mode 100644 index 00000000..4ea4873e --- /dev/null +++ b/docs/guides/project-structure/automating-tasks.md @@ -0,0 +1,25 @@ +# Automate common tasks + +If you reach the point where you need to run a specific sequence of commands or actions to achieve something — e.g., running a model simulation, or producing an output figure — it is a **very good idea** to write a script that performs all of these actions correctly. + +This is because while you may remember exactly what needs to be done **right now**, you may not remember next week, or next month, or next year. +We're all human, and we all make mistakes, but these kinds of mistakes are **easy to avoid**! + +!!! info + + Mistakes are a fact of life. It is the response to the error that counts. + + — Nikki Giovanni + +There are many tools that can help you to automate tasks, some of which are smart enough that they will only do as little as possible (e.g., avoid re-running steps if the inputs have not changed). + +There are popular tools aimed at specific programming languages, such as: + +- **R:** [targets](https://books.ropensci.org/targets/); + +- **Python**: [nox](https://nox.thea.codes/) and [tox](https://tox.wiki/); and + +- **Julia:** [pipelines](https://juliapackages.com/p/pipelines). + +There are many generic automation tools (see, e.g., Wikipedia's list of [build automation software](https://en.wikipedia.org/wiki/List_of_build_automation_software)), although these can be rather complex to learn. +We recommend using a language-specific automation tool where possible, and only using a generic automation tool as a last resort. diff --git a/docs/guides/project-structure/exercise-a-good-readme.md b/docs/guides/project-structure/exercise-a-good-readme.md new file mode 100644 index 00000000..5d891e7d --- /dev/null +++ b/docs/guides/project-structure/exercise-a-good-readme.md @@ -0,0 +1,15 @@ +# Exercise: a good README + +Remember that the README file (usually one of `README.md`, `README.rst`, or `README.txt`) is often **the very first thing** that a user will see when looking at your project. + +- Have you seen any README files that were particularly helpful, or were not very helpful? + +- What information do you find helpful in a README file? + +Consider the `README.md` file in the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts). + +- What content, if any, would you **add** to this file? + +- What content, if any, would you **remove** from this file? + +- Would you change its structure in any way? diff --git a/docs/guides/project-structure/exercise-what-works-for-you.md b/docs/guides/project-structure/exercise-what-works-for-you.md new file mode 100644 index 00000000..bf311b21 --- /dev/null +++ b/docs/guides/project-structure/exercise-what-works-for-you.md @@ -0,0 +1,9 @@ +# Exercise: what works for you? + +Look back at your past projects and identify aspects of their structure that you have found helpful. + +- What features or choices have worked well in past projects and might help you structure your future projects? + +- What problems or issues have you experienced with the structure of your past projects, which you could avoid in your future projects? + +- Can any of your colleagues and collaborators share similar insights? diff --git a/docs/guides/project-structure/explain-how-it-works.md b/docs/guides/project-structure/explain-how-it-works.md new file mode 100644 index 00000000..d3623698 --- /dev/null +++ b/docs/guides/project-structure/explain-how-it-works.md @@ -0,0 +1,46 @@ +# Explain how it all works + +Once you've chosen a project structure, you need to write down **how it all works** — regardless of how simple and clear your project structure is! + +!!! tip + + The best place to do this is in a `README.md` file (or equivalent) in the project root directory. + +Begin with an overview of the project: + +- What question(s) are you trying to address? + +- What data, hypotheses, methods, etc, are you using? + +- What outputs does this generate? + +You can then provide further detail, such as: + +- What software environment and/or packages must be available for your code to run? + +- How can the user generate each of the outputs? + +- What license [have you chosen](../using-git/choosing-a-license.md)? + + +## An example README.md + +See the [Australian 2020 COVID-19 forecasts repository](https://gitlab.unimelb.edu.au/rgmoss/aus-2020-covid-forecasts) for an example `README.md` file. + +This repository was used to generate the results, tables, and figures presented in the paper "[Forecasting COVID-19 activity in Australia to support pandemic response: May to October 2020](https://doi.org/10.1038/s41598-023-35668-6)", *Scientific Reports* 13, 8763 (2023). + +**Strengths:** + +- It includes installation and usage instructions; + +- It identifies the paper; and + +- It identifies the license under which the code is distributed. + +**Weaknesses:** + +- It only explains some of the project structure. + +- It doesn't provide an overview of the project, it only links to the paper. + +- The root directory contains a number of scripts and input files that aren't described. diff --git a/docs/guides/project-structure/workflow.md b/docs/guides/project-structure/workflow.md new file mode 100644 index 00000000..eb0e8053 --- /dev/null +++ b/docs/guides/project-structure/workflow.md @@ -0,0 +1,70 @@ +# Define your workflow + +A good first step in deciding how to structure a project is to ask yourself: + +- What are the different project phases? + +- What are the major activities in each phase? + +## An example of phases and activities + +For example, a project might involve the following phases: + +1. Clean an existing data set; + +2. Build models with different hypotheses or features; + +3. Fit each model to the data; and + +4. Decide which model best explains the data. + +The data-cleaning phase might involve the following activities: + +- Obtain the raw data; + +- Identify the quality checks that should be applied; + +- Decide how to resolve data that fail each quality check; and + +- Generate and record the cleaned data. + +The model-building phase might involve the following activities: + +- Perform a literature search to identify relevant modelling studies; + +- Identify competing hypotheses or features that might explain the data; + +- Design a model that implements each hypothesis; and + +- Define the relationship between each model and the cleaned data. + +## Reflect this workflow in your project structure + +You can use the phases and activities to guide your choice of directory structure. +For this example project, one possible structure is: + +- `project/`: the root directory of your project + + - `input/`: a sub-directory that contains input data; + + - `raw/`: the raw data **before** cleaning; + + - `cleaned/`: the cleaned data; + + - `code/`: a sub-directory that contains the project code; + + - `cleaning/`: the data cleaning code; + + - `model-first-hypothesis/`: the first model; + + - `model-second-hypothesis/`: the second model; + + - `fitting/`: the code that fits each model to the data; + + - `evaluation/`: the code the compares the model fits; + + - `plotting/`: the code that plots output figures; + + - `paper/`: a sub-directory for the project manuscript; + + - `figures/`: the output figures; diff --git a/docs/guides/writing-code/README.md b/docs/guides/writing-code/README.md new file mode 100644 index 00000000..0a7cba8a --- /dev/null +++ b/docs/guides/writing-code/README.md @@ -0,0 +1,13 @@ +# Writing code + +For computational research, code is an important scientific artefact for the author, for colleagues and collaborators, and for the scientific community. +It is the **ultimate form** of expressing **what you did** and **how you did it**. +With good version control and documentation practices, it can also capture **when and why** you made important decisions. + +!!! tip + + [W]e want to establish the idea that a computer language is not just a way of getting a computer to perform operations but rather that it is a novel formal medium for **expressing ideas about methodology**. + Thus, programs must be **written for people to read**, and only incidentally for machines to execute. + + — [Structure and Interpretation of Computer Programs](https://mitpress.mit.edu/9780262510875/). + Abelson, Sussman, and Sussman, 1984. diff --git a/docs/guides/writing-code/behave-nicely.md b/docs/guides/writing-code/behave-nicely.md new file mode 100644 index 00000000..42b9c3ca --- /dev/null +++ b/docs/guides/writing-code/behave-nicely.md @@ -0,0 +1,44 @@ +# Behave nicely + +Would you feel comfortable running someone else's code if you thought it might affect your other files, applications, settings, or do something else that's unexpected? + +!!! tip + + Your code should be **encapsulated:** it should assume as little as possible about the computer on which it is running, and it shouldn't mess with the user's environment. + +!!! tip + + Your code should follow the **principal of least surprise:** behave in a way that most users will expect it to behave, and not astonish or surprise them. + +## A cake analogy + +Suppose you have two colleagues who regularly bake cakes, and you decide you'd like one of them to bake you a lemon cake with chocolate icing. + +- **A nice colleague:** you ask your colleague to bake a lemon cake with chocolate icing. + + - That evening, they go home and bake a cake. + - They bring the cake to work the next day. + - The cake tastes of lemon and is topped with chocolate icing. + +- **A messy colleague:** you ask your colleague to bake a lemon cake with chocolate icing. + + - They reply that they will make a cake + - The next day, they come into your office with the ingredients and a portable oven. + - They begin mixing ingredients, making a huge mess on your desk. + - You have to wait until the batter is mixed before they ask you for your choice of flavour. + - They don't have lemons, but add some orange zest to the batter. + - Once the cake is baked, they let it cool. + - One hour later they ask you what flavour icing you want. + - They don't have chocolate or cocoa, so they a different icing. + - They give you the cake. + - The cake tastes of orange and is topped with strawberry icing. + - Your office is covered in flour, sugar, and cake batter. + +## Some specific tips + +- Avoid modifying files outside of the project directory! + +- Avoid using hard-coded absolute paths, such as `C:\Users\My Name\Some Project\...` or `/Users/My Name/Some other directory`. + These make it harder for other people to use the code, or to run the code on high-performance computing platforms. + +- Prefer using paths that are relative to the root directory of your project, such as `input-data/case-data/cases-for-2023.csv`. diff --git a/docs/guides/writing-code/check-your-code.md b/docs/guides/writing-code/check-your-code.md new file mode 100644 index 00000000..15cebc5f --- /dev/null +++ b/docs/guides/writing-code/check-your-code.md @@ -0,0 +1,14 @@ +# Check your code + +A "linter" is a tool that checks your code for syntax errors, possible mistakes, inconsistent formatting, and other potential issues. + +We **strongly recommend** using an editor that displays linter warnings as you write your code. +Having instant feedback allows you to rapidly resolve many common issues and substantially improve your code. + +We list here some of the most commonly used linters: + +- **R:** [lintr](https://lintr.r-lib.org/) + +- **Python:** [ruff](https://docs.astral.sh/ruff/) + +- **Julia:** [Lint.jl](https://lintjl.readthedocs.org/en/stable/) diff --git a/docs/guides/writing-code/coding-advice.md b/docs/guides/writing-code/coding-advice.md new file mode 100644 index 00000000..b0bd88a5 --- /dev/null +++ b/docs/guides/writing-code/coding-advice.md @@ -0,0 +1,30 @@ +# Coding advice + +- Think about how to cleanly structure your code. + Take a **similar approach to how we write papers and grants**. + +- Break the overall problem into pieces, and then decide how to structure each piece in turn. + +- Divide your code into functions that each do one "thing", and group related functions into separate files or modules. + +- It can sometimes help to think about how you want the final code to look, and then design the functions and components that are needed. + +- Avoid global variables, aim to pass everything as function arguments. + This makes the code more robust and easier to run. + +- Avoid passing lots of individual parameters as separate arguments, this is prone to error — you might not pass them in the correct order. + Instead, collect the parameters into a single structure (e.g, a Python dictionary, an R named list). + +- Avoid making multiple copies of a model if you want to change some aspect of its behaviour. + Instead, add a new model parameter that enables/disables this new behaviour. + This allows you to use the same code to run the older and newer versions of the model. + +- Try to collect common or related tasks into a single script, and allow the user to select which task(s) to run, rather than creating many scripts that perform very similar tasks. + +- Write test cases to check key model properties. + + - You want to identify problems and mistakes as soon as possible! + + - Thinking about how to make your code testable can help you improve its structure! + + - Well-written tests can also demonstrate **how to use your code**! diff --git a/docs/guides/writing-code/cohesion-coupling.md b/docs/guides/writing-code/cohesion-coupling.md new file mode 100644 index 00000000..89b9b56b --- /dev/null +++ b/docs/guides/writing-code/cohesion-coupling.md @@ -0,0 +1,58 @@ +# Cohesion and coupling + +**Divide your code** into modules, each of which does one thing ("high cohesion") and depends as little as possible on other pieces ("low coupling"). + +## Common project components + +For example, an infectious diseases modelling project might often be divided into some of the following components: + +- The model parameters — what are their values or prior distributions? + +- The initial model state — how is this created from the model parameters? + +- The model equations or update rules — how does the model evolve over time? + +- Summary statistics — what do you want to record for each simulation? + This might be the entire state history, a subset of the history, some aggregate statistics, or any combination of these things. + +- The input data (if any) — these may be case data, serological data, within-host specimen counts, etc. + +- The relationship between data and the model state ("observation model"). + +- Simulated data generated from a model simulation. + +As much as possible, each of these components (where relevant to your project) should be represented as **a separate piece of code**. + +## Separating the "what" from the "how" + +Dividing your code into separate components is especially important if you want to use a model for multiple purposes, such as: + +- Exploring different scenarios; +- Fitting to various data sets; +- Performing sensitivity and uncertainty analyses; and +- Forecasting future data. + +!!! tip + + In particular, keep the following aspects of your project separate: + + - **What to do:** fitting to different data sets, exploring different scenarios, performing a sensitivity analysis, etc; and + + - **How to do it:** the model implementation. + + If you want to explore a range of model scenarios, for example, define the parameter values (or sampling distributions) for each scenario in a separate input file. + Then write a script that takes an input file name as an argument, reads the parameter values, and uses these values to run the model simulations. + + This makes it extremely simple to define and run new scenarios without modifying your code. + +## Interactions between components + +Choosing how your components interact (e.g., by calling functions or passing data) is **just as important** as deciding how to divide your code into components. + +Here are some key recommendations from [Object-Oriented Software Construction (2nd ed)](https://bertrandmeyer.com/OOSC2/): + +- Small interfaces: if two modules communicate, they should exchange as little information as possible. + +- Explicit interfaces: if two modules communicate, it should be obvious from the code in one or both of these modules. + +- Self documentation: strive to make all information about a module part of the module itself. diff --git a/docs/guides/writing-code/create-packages.md b/docs/guides/writing-code/create-packages.md new file mode 100644 index 00000000..f63f7d79 --- /dev/null +++ b/docs/guides/writing-code/create-packages.md @@ -0,0 +1,54 @@ +# Create packages + +For languages such as R, Python, and Julia, it is generally a good idea to **write your code as a package/library**. +This can make it easier to install and run your code on a new computer, on a high-performance computing platform, and for others to use on their own computers. + +!!! info + + This is **a simple process** and entirely separate from **publishing** your package or making it publicly available. + + It also means you can avoid using `source()` in R, or adding directories to `sys.path` in Python. + +To create a package you need to provide some necessary information, such as a package name, and the list of the packages that your code depends on ("dependencies"). +You can then use packaging tools to **verify** that you've correctly identified these dependencies and that your package can be successfully installed and used! + +This is an important step towards **ensuring your work is reproducible**. + +There are some great online resources that can help you get started. +We list here some widely-recommended resources for specific languages. + +## Writing R packages + +For [R](https://www.r-project.org/), see [R Packages (2nd ed)](https://r-pkgs.org/) and the [devtools package](https://devtools.r-lib.org/). + +Other useful references include: + +- [rOpenSci Packages: Development, Maintenance, and Peer Review](https://devguide.ropensci.org/); +- [Writing an R package from scratch](https://hilaryparker.com/2014/04/29/writing-an-r-package-from-scratch/) by Hilary Parker; +- [How to develop good R packages](https://masalmon.eu/2017/12/11/goodrpackages/) by Maëlle Salmon; +- [Making your first R package](https://tinyheero.github.io/jekyll/update/2015/07/26/making-your-first-R-package.html) by Fong Chun Chan; and +- [Writing an R package from scratch](https://r-mageddon.netlify.app/post/writing-an-r-package-from-scratch/) by Tomas Westlake. + + +!!! info + + rOpenSci offers [peer review of statistical software](https://stats-devguide.ropensci.org/). + +## Writing Python packages + +The [Python Packaging User Guide](https://packaging.python.org/en/latest/) provides a tutorial on [Packaging Python Projects](https://packaging.python.org/en/latest/tutorials/packaging-projects/). + +Other useful references include: + +- The [pyOpenSci project](https://www.pyopensci.org/) also provide a [Python Packaging Guide](https://www.pyopensci.org/python-package-guide/). + This includes information about [code style, formatting, and linters](https://www.pyopensci.org/python-package-guide/package-structure-code/code-style-linting-format.html). + +- This [example Python project](https://gitlab.unimelb.edu.au/rgmoss/example-python-project/) demonstrates one way of structuring a Python project as a package. + +!!! info + + pyOpenSci offers [peer review of scientific software](https://www.pyopensci.org/software-peer-review/) + +## Writing Julia Packages + +The Julia's [package manager documentation](https://pkgdocs.julialang.org/dev/) provides a guide to [Creating Packages](https://pkgdocs.julialang.org/dev/creating-packages/) diff --git a/docs/guides/writing-code/document-your-code.md b/docs/guides/writing-code/document-your-code.md new file mode 100644 index 00000000..6c346401 --- /dev/null +++ b/docs/guides/writing-code/document-your-code.md @@ -0,0 +1,59 @@ +# Document your code + +Writing clear, well-structured code, can make it easier for someone to understand what your code does. +You might think that this means your code is so clear and obvious that it needs no further explanation. + +But this is not true! +There is **always** a role for writing comments and documentation. +By itself, your code cannot always explain: + +- **What** goal you are trying to achieve; + +- **How** you are achieving this goal; and + +- **Why** you've chosen this approach. + +!!! question + + What can you do to make your code more easily understandable? + +## Naming + +Use good names for functions, parameters, and variables. +This can be **deceptively hard**. + +!!! quote + + There are only two hard things in Computer Science: cache invalidation and naming things. + + — Phil Karlton + +## Explaining + +Have you explained the **intention** of your code? + +!!! tip + + Good comments don't say **what the code does**; instead, they **explain why** the code does what it does. + +For each function, write a comment that explains what the function does, describes the purpose of each parameter, and describes what values the function returns (if any). + +## Documenting + +Many programming languages support "docstrings". +These are usually comments with additional structure and formatting, and can be used to automatically generate documentation: + +- **R:** [roxygen2](https://roxygen2.r-lib.org/) + +- **Python:** there are [several formats](http://web.archive.org/web/20230128071653/http://daouzli.com/blog/docstring.html) + +- **Julia:** [Writing Documentation](https://docs.julialang.org/en/v1/manual/documentation/) + +See the CodeRefinery [In-code documentation](https://coderefinery.github.io/documentation/in-code-documentation/#what-are-docstrings-and-how-can-they-be-useful) lesson for some good examples of docstrings. + +## Commenting out code + +Avoid commenting out code. +If it's no longer useful, delete it and save this as a commit! +Make sure you write a helpful commit message. +You can always recover the deleted code if you need it later. diff --git a/docs/guides/writing-code/exercise-seek-feedback.md b/docs/guides/writing-code/exercise-seek-feedback.md new file mode 100644 index 00000000..69fec256 --- /dev/null +++ b/docs/guides/writing-code/exercise-seek-feedback.md @@ -0,0 +1,8 @@ +# Exercise: seek feedback + +!!! question + + One goal to keep in mind is to ensure your work is **conceptually accessible**: how readily could someone else (or even yourself, after a period of absence) understand your code? + +- Seek feedback on some code that you are **currently writing** — before the code is "finished". + You may want to refer to our [peer code review guidelines](../collaborating/peer-code-review.md) and consider [what you want to learn](how-we-learn-to-write-code.md). diff --git a/docs/guides/writing-code/format-your-code.md b/docs/guides/writing-code/format-your-code.md new file mode 100644 index 00000000..5ece13de --- /dev/null +++ b/docs/guides/writing-code/format-your-code.md @@ -0,0 +1,24 @@ +# Format your code + +!!! question + + Have you ever looked at someone else's code and found it hard to read because they formatted it differently to your code? + +Using a consistent code style can help make your code more legible and accessible to others, in much the same way that standard use of punctuation and spacing makes written text easier to read. + +!!! tip + + Good coding style is like using correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. + + — Hadley Wickham, [the tidyverse style guide](https://style.tidyverse.org/) + +We **strongly recommend** using an editor that can automatically format your code whenever you save. +This allows you to completely forget about formatting and focus on the content. + +We list here some of the most commonly used style guides and code formatters: + +| Language | Style guide(s) | Formatter | +|----------|-----------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------| +| R | [tidyverse](https://style.tidyverse.org/) | [styler](https://styler.r-lib.org/) | +| Python | [PEP 8](https://peps.python.org/pep-0008/) and [The Hitchhiker's Style Guide](https://docs.python-guide.org/writing/style/) | [black](https://black.readthedocs.io/en/stable/) | +| Julia | [style guide](https://docs.julialang.org/en/v1/manual/style-guide/) | [Lint.jl](https://lintjl.readthedocs.org/en/stable/) | diff --git a/docs/guides/writing-code/how-we-learn-to-write-code.md b/docs/guides/writing-code/how-we-learn-to-write-code.md new file mode 100644 index 00000000..ad38929f --- /dev/null +++ b/docs/guides/writing-code/how-we-learn-to-write-code.md @@ -0,0 +1,74 @@ +# How we learn to write code + +!!! question + + How have you learned to write code? + Were you given any formal training? + +Unless you studied Software Engineering, you may never have had any formal training. +And that's okay! +**Nobody writes perfect code**. + +There are various resources available (including this book) that can help you to improve your coding skills. +But the most effective way to improve is to write code **and get feedback**. + +!!! tip + + You can practice shooting eight hours a day, but if your technique is wrong, then all you become is very good at shooting the wrong way. + + — Michael Jordan + +## How we learn to write papers + +Throughout our research careers, we are continually learning and developing our ability to write scientific papers. +One of the main ways that we develop this ability is to seek **feedback early and often**, by circulating drafts to co-authors, supervisors, and trusted colleagues. + +This feedback not only helps us improve the paper that we're currently working on, but also improves our ability to write papers in the future. + +We gradually learn how to express ourselves clearly at multiple levels: + +- Writing individual sentences that clearly convey a single thought or observation; + +- Constructing paragraphs that span a single topic or idea; + +- Structuring an entire paper so that the reader can easily navigate it. + +## How we ***currently*** learn to code + +Many of us learn to write code as a by-product of our chosen research area, and may not have any formal computer programming training. +However, while we may make our finished code available as a support material for our published papers, we don't typically show our code to our co-authors. + +!!! info + + While there are many reasons why we are reluctant to share our code, perhaps the biggest factor is a sense of shame. + We may feel that our code is "bad" — too bad to share with others! — and that if we've ever made a mistake in our code, we're the only person who has ever done so. + + **This is simply untrue!** + +## How we ***should*** learn to code + +We should treat writing code the same way that we treat writing papers, grant applications, and fellowship applications: **seek feedback early, and seek feedback often**. + +!!! question + + Wouldn't you prefer that the first person who looks at your code is a trusted colleague, rather than a random person who has read your paper and now wants to see how the code works? + +[Peer code review](../collaborating/peer-code-review.md) offers a structured way to: + +- Discuss and critique a person's work in a kind and supportive manner; + +- Praise good work; + +- Identify where code is well-structured and clear, and where it could be improved; and + +- Share relevant knowledge and expertise. + +Similar to writing papers, we should **seek feedback at multiple levels**: + +- Are individual lines of code clear and correct? + +- Are strongly-related lines of code grouped into functions that each do a single thing? + +- Are functions grouped into modules that focus on specific aspects or features? + +- Can the reader easily navigate the code? diff --git a/mkdocs.yml b/mkdocs.yml index 5dbef90f..93a46609 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -58,12 +58,30 @@ nav: - "Peer code review": guides/collaborating/peer-code-review.md - "Coding style guides": guides/collaborating/coding-style-guides.md - "Continuous integration": guides/collaborating/continuous-integration.md - - "4. Reproducibility": + - "4. Project structure": + - "guides/project-structure/README.md" + - "Define your workflow": "guides/project-structure/workflow.md" + - "Automate common tasks": "guides/project-structure/automating-tasks.md" + - "Explain how it works": "guides/project-structure/explain-how-it-works.md" + - "Exercise: what works for you?": "guides/project-structure/exercise-what-works-for-you.md" + - "Exercise: a good README": "guides/project-structure/exercise-a-good-readme.md" + - "5. Writing code": + - "guides/writing-code/README.md" + - "How we learn to write code": "guides/writing-code/how-we-learn-to-write-code.md" + - "Cohesion and coupling": "guides/writing-code/cohesion-coupling.md" + - "Behave nicely": "guides/writing-code/behave-nicely.md" + - "Coding advice": "guides/writing-code/coding-advice.md" + - "Create packages": "guides/writing-code/create-packages.md" + - "Check your code": "guides/writing-code/check-your-code.md" + - "Format your code": "guides/writing-code/format-your-code.md" + - "Document your code": "guides/writing-code/document-your-code.md" + - "Exercise: seek feedback": "guides/writing-code/exercise-seek-feedback.md" + - "6. Reproducibility": - guides/reproducibility/README.md - "What is reproducible research?": "guides/reproducibility/what-is-reproducible-research.md" - - "5. Testing": + - "7. Testing": - guides/testing/README.md - - "6. Cloud and HPC platforms": + - "8. Cloud and HPC platforms": - guides/high-performance-computing/README.md - "Useful resources": guides/resources.md - "Community": From 6a630669c97a33603feb8fc53ec5c8217d5f4291 Mon Sep 17 00:00:00 2001 From: Rob Moss Date: Wed, 3 Jul 2024 10:17:33 +1000 Subject: [PATCH 2/2] Simplify the cake analogy The previous version was far too long and detailed; the intent is to make a simple point, rather than tell a long story. --- docs/guides/writing-code/behave-nicely.md | 32 +++++++++++------------ 1 file changed, 15 insertions(+), 17 deletions(-) diff --git a/docs/guides/writing-code/behave-nicely.md b/docs/guides/writing-code/behave-nicely.md index 42b9c3ca..ea9a1d1b 100644 --- a/docs/guides/writing-code/behave-nicely.md +++ b/docs/guides/writing-code/behave-nicely.md @@ -12,27 +12,20 @@ Would you feel comfortable running someone else's code if you thought it might a ## A cake analogy -Suppose you have two colleagues who regularly bake cakes, and you decide you'd like one of them to bake you a lemon cake with chocolate icing. +Suppose you have two colleagues who regularly bake cakes, and you decide you'd like one of them to bake you a chocolate cake. -- **A nice colleague:** you ask your colleague to bake a lemon cake with chocolate icing. +- **A nice colleague:** - That evening, they go home and bake a cake. - They bring the cake to work the next day. - - The cake tastes of lemon and is topped with chocolate icing. - -- **A messy colleague:** you ask your colleague to bake a lemon cake with chocolate icing. - - - They reply that they will make a cake - - The next day, they come into your office with the ingredients and a portable oven. - - They begin mixing ingredients, making a huge mess on your desk. - - You have to wait until the batter is mixed before they ask you for your choice of flavour. - - They don't have lemons, but add some orange zest to the batter. - - Once the cake is baked, they let it cool. - - One hour later they ask you what flavour icing you want. - - They don't have chocolate or cocoa, so they a different icing. - - They give you the cake. - - The cake tastes of orange and is topped with strawberry icing. - - Your office is covered in flour, sugar, and cake batter. + - The cake tastes of chocolate. + +- **A messy colleague:** + + - They bring the ingredients and a portable oven into your office. + - They make a huge mess, splattering your desk and computer. + - The oven is noisy and makes the office uncomfortably warm. + - The cake tastes of vanilla, not chocolate. ## Some specific tips @@ -42,3 +35,8 @@ Suppose you have two colleagues who regularly bake cakes, and you decide you'd l These make it harder for other people to use the code, or to run the code on high-performance computing platforms. - Prefer using paths that are relative to the root directory of your project, such as `input-data/case-data/cases-for-2023.csv`. + If you're using R, the [here](https://here.r-lib.org/) package is extremely helpful. + +- Warn the user before running tasks that take a long time to complete. + +- Notify the user before downloading large files.