From aaaf910ebf632ce7fd21195431dce74a0f9642ed Mon Sep 17 00:00:00 2001 From: robmoss Date: Mon, 26 Feb 2024 07:00:35 +0000 Subject: [PATCH] =?UTF-8?q?Deploy=20preview=20for=20PR=2065=20=F0=9F=9B=AB?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../moss-pypfilt-earlier-states/index.html | 14 +++++++------- pr-preview/pr-65/search/search_index.json | 2 +- pr-preview/pr-65/sitemap.xml.gz | Bin 850 -> 850 bytes 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/pr-preview/pr-65/case-studies/moss-pypfilt-earlier-states/index.html b/pr-preview/pr-65/case-studies/moss-pypfilt-earlier-states/index.html index a80771b2..e3a6cd6c 100644 --- a/pr-preview/pr-65/case-studies/moss-pypfilt-earlier-states/index.html +++ b/pr-preview/pr-65/case-studies/moss-pypfilt-earlier-states/index.html @@ -2071,8 +2071,8 @@

Notice the bugIdentify the cause of the bug

-

I knew that the bug had been introduced quite recently, and I knew that it affected a specific function: earlier_states(). -Running git blame src/pypfilt/state.py indicated that the recent commit 408b5f1 was a likely culprit, because it changed many lines in this function.

+

I knew that the bug had been introduced quite recently, and I knew that it affected a specific function: earlier_states(). +Running git blame src/pypfilt/state.py indicated that the recent commit 408b5f1 was a likely culprit, because it changed many lines in this function.

In particular, I suspected the bug was occurring in the following loop, which steps backwards in time and handles the case where model simulations are reordered:

# Start with the parent indices for the current particles, which allow us
 # to look back one time-step.
@@ -2091,7 +2091,7 @@ 

Identify the cause of the bugWrite a test case

-

I wrote a test case test_earlier_state() that called this earlier_states() function a number of times, and checked that each set of model simulations were returned in the correct order.

+

I wrote a test case test_earlier_state() that called this earlier_states() function a number of times, and checked that each set of model simulations were returned in the correct order.

This test case checks that:

  1. @@ -2107,17 +2107,17 @@

    Write a test caseFix the bug

    With the test case now written, I was able to verify that that changing step_ix + 1 to step_ix did fix the bug.

    -

    I added the test case and the bug fix in commit 9dcf621.

    +

    I added the test case and the bug fix in commit 9dcf621.

    In the commit message I indicated:

    1. -

      Where the bug was located: the earlier_states() function;

      +

      Where the bug was located: the earlier_states() function;

    2. -

      When the bug was introduced: commit 408b5f1; and

      +

      When the bug was introduced: commit 408b5f1; and

    3. -

      Why the bug was not detected when I created commit 408b5f1.

      +

      Why the bug was not detected when I created commit 408b5f1.

    diff --git a/pr-preview/pr-65/search/search_index.json b/pr-preview/pr-65/search/search_index.json index 4307fa07..c4f30d5d 100644 --- a/pr-preview/pr-65/search/search_index.json +++ b/pr-preview/pr-65/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":"

    These materials aim to support early- and mid-career researchers (EMCRs) in the SPECTRUM and SPARK networks to develop their computing skills, and to make effective use of available tools1 and infrastructure2.

    "},{"location":"#motivation","title":"Motivation","text":"

    Question

    Why dedicate time and effort to learning these skills? There are many reasons!

    The overall aim of these materials is help you conduct code-driven research more efficiently and with greater confidence.

    Hopefully some of the following reasons resonate with you.

    • Fearlessly modify your code, knowing that your past work is never lost, by using version control.

    • Verify that your code behaves as expected, and get notified when it doesn't, by writing tests.

    • Ensure that your results won't change when running on a different computer by \"baking in\" reproducibility.

    • Improve your coding skills, and those of your colleagues, by working collaboratively and making use of peer code review.

    • Run your code quickly, and without relying on your own laptop or computer, by using high-performance computing.

    Foundations of effective research

    A piece of code is often useful beyond a single project or study.

    By applying the above skills in your research, you will be able to easily reproduce past results, extend your code to address new questions and problems, and allow others to build on your code in their own research.

    The benefits of good practices can continue to pay off long after the project is finished.

    "},{"location":"#structure","title":"Structure","text":"

    These materials are divided into the following sections:

    1. Understanding version control, which provides you with a complete and annotated history of your work, and with powerful ways to search and examine this history.

    2. Learning to use Git, the most widely used version control system, which is the foundation of popular code-hosting services such as GitHub, GitLab, and Bitbucket.

    3. Using Git to collaborate with colleagues in a precisely controlled and manageable way.

    4. Ensuring that your research is reproducible by others.

    5. Using testing frameworks to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code.

    6. Running your code on various computing platforms that allow you to obtain results efficiently and without relying on your own laptop/computer.

    7. Case studies where EMCRs showcase how their research activities are enabled and/or supported by these tools.

    8. We are organising a Community of Practice that will act as a living curriculum, and will use this section to record the findings and outputs of our community activities.

    "},{"location":"#how-to-contribute","title":"How to contribute","text":"

    If you have any comments, feedback, or suggestions, please see the How to contribute page.

    "},{"location":"#license","title":"License","text":"

    This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

    1. Such as version control and testing frameworks.\u00a0\u21a9

    2. Such as the ARDC Nectar Research Cloud and Spartan.\u00a0\u21a9

    "},{"location":"contributors/","title":"Contributors","text":"

    Here is a list of the contributors who have helped develop these materials:

    • Rob Moss (robmoss)
    • Eamon Conway (EamonConway)
    • James Ong (jomonman537)
    • Trish Campbell (TrishC)
    • Isobel Abell (iabell)
    "},{"location":"how-to-contribute/","title":"How to contribute","text":""},{"location":"how-to-contribute/#add-a-case-study","title":"Add a case study","text":"

    If you've made use of Git in your research activities, please let us know! We're looking for case studies that highlight how EMCRs are using Git. See the instructions for suggesting new content (below).

    "},{"location":"how-to-contribute/#provide-comments-and-feedback","title":"Provide comments and feedback","text":"

    The easiest way to provide comments and feedback is to create an issue. Note that this requires a GitHub account. If you do not have a GitHub account, you can email any of the authors. Please include \"Git is my lab book\" in the subject line.

    "},{"location":"how-to-contribute/#suggest-modifications-and-new-content","title":"Suggest modifications and new content","text":"

    This book is written in Markdown and is published using Material for MkDocs. See the Material for MkDocs Reference for an overview of the supported features.

    You can suggest modifications and new content by:

    • Forking the book repository;

    • Adding, deleting, and/or modifying book chapters in the docs/ directory;

    • Recording your changes in one or more git commits; and

    • Creating a pull request, so that we can review your suggestions.

    Info

    You can also edit any page by clicking the \"Edit this page\" button () in the top-right corner. This will start the process described above by forking the book repository.

    Tip

    When editing Markdown content, please start each sentence on a separate line. Also check that your text editor removes trailing whitespace.

    This ensures that each commit will contain only the modified sentences, and makes it easier to inspect the repository history.

    Tip

    When you add a new page, you must also add the page to the nav block in mkdocs.yml.

    "},{"location":"how-to-contribute/#adding-tabbed-code-blocks","title":"Adding tabbed code blocks","text":"

    You can display content in multiple tabs by using ===. For example:

    === \"Python\"\n\n    ```py\n    print(\"Hello world\")\n    ```\n\n=== \"R\"\n\n    ```R\n    cat(\"Hello world\\n\")\n    ```\n\n=== \"C++\"\n\n    ```cpp\n    #include <iostream>\n\n    int main() {\n        std::cout << \"Hello World\";\n        return 0;\n    }\n    ```\n\n=== \"Shell\"\n\n    ```sh\n    echo \"Hello world\"\n    ```\n\n=== \"Rust\"\n\n    ```rust\n    fn main() {\n        println!(\"Hello World\");\n    }\n    ```\n

    produces:

    PythonRC++ShellRust
    print(\"Hello world\")\n
    cat(\"Hello world\\n\")\n
    #include <iostream>\n\nint main() {\n    std::cout << \"Hello World\";\n    return 0;\n}\n
    echo \"Hello world\"\n
    fn main() {\n    println!(\"Hello World\");\n}\n
    "},{"location":"how-to-contribute/#adding-terminal-session-recordings","title":"Adding terminal session recordings","text":"

    You can use asciinema to record a terminal session, and display this recorded session with a small amount of HTML and JavaScript. For example, the following code is used to display the where-did-this-line-come-from.cast recording in a tab called \"Video demonstration\", as shown in Where did this line come from? chapter:

    === \"Video demonstration\"\n\n    <div id=\"demo\" data-cast-file=\"../where-did-this-line-come-from.cast\"></div>\n

    You can also add links that jump to specific times in the video. Each link must have:

    • A data-video attribute that identifies the video (in the example above, this is \"demo\");
    • A data-seek-to attribute that identifies the time (in seconds) to jump to; and
    • A href attribute that is set to \"javascript:;\" (so that the link doesn't scroll the page).

    For example, the following code is used to display the video recording on the Choosing your Git Editor:

    === \"Git editor example\"\n\n    <div id=\"demo\" data-cast-file=\"../git-editor-example.cast\"></div>\n\n    Video timeline:\n\n    1. <a data-video=\"demo\" data-seek-to=\"4\" href=\"javascript:;\">Overview</a>\n    2. <a data-video=\"demo\" data-seek-to=\"17\" href=\"javascript:;\">Show how to use nano</a>\n    3. <a data-video=\"demo\" data-seek-to=\"71\" href=\"javascript:;\">Show how to use vim</a>\n
    "},{"location":"learning-objectives/","title":"Learning objectives","text":"

    This page defines the learning objectives for individual sections. These are skills that the reader should be able to demonstrate after reading through the relevant section, and completing any exercises in that section.

    "},{"location":"learning-objectives/#version-control-concepts","title":"Version control concepts","text":"

    After completing this section, you should be able to identify how to apply version control concepts to your existing work. This includes being able to:

    • Identify projects and tasks for which version control would be suitable;

    • Categorise recent work activities into one or more commits;

    • Write commit messages that describe what changes you made and why you made them; and

    • Identify pieces of work that could be carried out in separate branches of a repository.

    "},{"location":"learning-objectives/#effective-use-of-git","title":"Effective use of git","text":"

    After completing this section, you should be able to:

    • Create a local repository;

    • Create commits in your local repository;

    • Search your commit history to identify commits that made a specific change;

    • Create a remote repository;

    • Push commits from your local repository to a remote repository;

    • Pull commits from a remote repository to your local repository;

    • Use tags to identify important milestones;

    • Work in a separate branch and then merge your changes into your main branch; and

    • Resolve merge conflicts.

    "},{"location":"learning-objectives/#collaborating","title":"Collaborating","text":"

    After completing this section, you should be able to:

    • Share a repository with one or more collaborators;

    • Create a pull request;

    • Use a pull request to review a collaborator's work;

    • Use a pull request to merge a collaborator's work into your main branch; and

    • Conduct peer code review in a respectful manner.

    "},{"location":"prerequisites/","title":"Prerequisites","text":"

    These materials assume that the reader has a basic knowledge of the Bash command-line shell and using SSH to connect to remote computers. You should be comfortable with using the command-line to perform the following tasks:

    • Navigate your files and directories;
    • Create, copy, move, and delete files and directories; and
    • Work remotely using SSH.

    Please refer to the following materials for further details:

    • The Unix Shell: an introduction to using Bash.
    • Extra Unix Shell Material: additional shell lessons, including SSH.

    Info

    If you use Windows, you may want to use PowerShell instead of Bash, in which case please refer to this Introduction to the Windows Command Line with Powershell.

    Some chapters also assume that the reader has an account on GitHub and has added an SSH key to their account.

    "},{"location":"references/","title":"References","text":""},{"location":"references/#education-and-commentary-articles","title":"Education and commentary articles","text":"
    • A Beginner's Guide to Conducting Reproducible Research describes key requirements for producing reproducible research outputs.

    • Point of View: How open science helps researchers succeed presents evidence that open research practices bring significant benefits to researchers.

    • A Quick Guide to Organizing Computational Biology Projects suggests an approach for structuring a computational research repository.

    "},{"location":"references/#using-git-and-other-software-tools","title":"Using Git and other software tools","text":"
    • NDP Software have created an interactive Git cheat-sheet that shows how git commands interact with the local and upstream repositories, and provides brief documentation for many common examples.

    • The Pro Git book is available online. It starts with an overview of Git basics and then covers every aspect of Git in great detail.

    • The Software Carpentry Foundation publishes many lessons, including Version Control with Git.

    • A Quick Introduction to Version Control with Git and GitHub provides a short guide to using Git and GitHub. It presents an example of analysing publicly available ChIP-seq data with Python. The repository for the article is also publicly available.

    "},{"location":"references/#performing-peer-code-review","title":"Performing peer code review","text":"
    • The Art of Giving and Receiving Code Reviews (Gracefully)

    • Code Review in the Lab

    • Scientific Code Review

    • The 5 Golden Rules of Code Review

    "},{"location":"references/#computational-research-practices","title":"Computational research practices","text":"
    • A simple kit to use computational notebooks for more openness, reproducibility, and productivity in research provides some good recommendations for organising a project repository and setting up a reproducible workflow using computational notebooks.

    • Why code rusts collects together some of reasons the behaviour of code changes over time.

    "},{"location":"references/#high-performance-computing-platforms","title":"High-performance computing platforms","text":"
    • How to access the ARDC Nectar Research Cloud

    • Melbourne Research Cloud

    • High Performance Computing at University of Melbourne

    "},{"location":"references/#how-to-acknowledge-and-cite-research-software","title":"How to acknowledge and cite research software","text":"
    • The ARDC Guide to making software citable explains how to cite your code and assign it a DOI.

    • Recognizing the value of software: a software citation guide provides further examples and guidance for ensuring your work receives proper attribution and credit.

    "},{"location":"references/#software-licensing","title":"Software licensing","text":"
    • Choose an open source license provides advice for selecting an appropriate license that meets your needs.

    • A Quick Guide to Software Licensing for the Scientist-Programmer explains the various types of available licenses and provides advice for selecting a suitable license.

    "},{"location":"case-studies/","title":"Case studies","text":"

    This section contains interesting and useful examples of incorporating Git into a research activity, as contributed by EMCRs in our network.

    "},{"location":"case-studies/campbell-pen-and-paper-version-control/","title":"Pen and paper - a less user-friendly form of version control than Git","text":"

    Author: Trish Campbell (patricia.campbell@unimelb.edu.au)

    Project: Pertussis modelling

    "},{"location":"case-studies/campbell-pen-and-paper-version-control/#the-problem","title":"The problem","text":"

    In this project, I developed a compartmental model of pertussis to determine appropriate vaccination strategies. While plotting some single model simulations, I noticed anomalies in the modelled output for two experiments. The first experiment had an order of magnitude more people in the infectious compartments than in the second experiment, even though there seemed to be far fewer infections occurring. This scenario did not fit with the parameter values that were being used. In the differential equation file for my model, in addition to extracting the state of the model (i.e. the population in each compartment at each time step), for ease of analysis I also extracted the cumulative number of infections up to that time step. The calculation for this extraction of cumulative incidence was incorrect.

    "},{"location":"case-studies/campbell-pen-and-paper-version-control/#the-solution","title":"The solution","text":"

    The error occurred because susceptible people in my model were not all equally susceptible, and I failed to account for this when I calculated the cumulative number of infections at each time step. I identified that this was the problem by running some targeted test parameter sets and observing the changes in model output. The next step was to find out how long this bug had existed in the code and which analyses had been affected. While I was using version control, I tended to make large infrequent commits. I did, however, keep extensive hand-written notes in lab books, which played the role of a detailed history of commits. Searching through my historical lab books, I identified that I had introduced this bug into the code two years earlier. I was able to determine which parts of my results would have been affected by the bug and made the decision that all experiments needed to be re-run.

    "},{"location":"case-studies/campbell-pen-and-paper-version-control/#how-version-control-helped","title":"How version control helped","text":"

    Using a pen and paper form of version control enabled me to pinpoint the introduction of the error and identify the affected analyses, but it was a tedious process. While keeping an immaculate record of changes that I had made was invaluable, imagine how much simpler and faster the process would have been if I had been a regular user of an electronic version control system such as Git!

    "},{"location":"case-studies/moss-incorrect-data-pre-print/","title":"Incorrect data in a pre-print figure","text":"

    Author: Rob Moss (rgmoss@unimelb.edu.au)

    Project: COVID-19 scenario modelling (public repository)

    "},{"location":"case-studies/moss-incorrect-data-pre-print/#the-problem","title":"The problem","text":"

    Our colleague James Trauer notified us that they suspected there was an error in Figure 2 of our COVID-19 scenario modelling pre-print article. This figure showed model predictions of the daily ICU admission demand in an unmitigated COVID-19 pandemic, and in a COVID-19 pandemic with case targeted public health measures. I inspected the script responsible for plotting this figure, and confirmed that I had mistakenly plotted the combined demand for ward and ICU beds, instead of the demand for ICU beds alone.

    "},{"location":"case-studies/moss-incorrect-data-pre-print/#the-solution","title":"The solution","text":"

    This mistake was simple to correct, but the obvious concern was whether any other outputs related to ICU bed demand were affected.

    We conducted a detailed review of all data analysis scripts and outputs, and confirmed that this error only affected this single manuscript figure. It had no bearing on the impact of the interventions in each model scenario. Importantly, it did not affect any of the simulation outputs, summary tables, and/or figures that were included in our reports to government.

    The corrected figure can be seen in the published article.

    "},{"location":"case-studies/moss-incorrect-data-pre-print/#how-version-control-helped","title":"How version control helped","text":"

    Because we used version control to record the development history of the model and all of the simulation analyses, we were able to easily inspect the repository state at the time of each prior analysis. This greatly simplified the review process, and ensured that we were inspecting the code exactly as it was when we produced each analysis.

    "},{"location":"case-studies/moss-pypfilt-earlier-states/","title":"Fixing a bug in pypfilt","text":"

    Author: Rob Moss (rgmoss@unimelb.edu.au)

    Project: pypfilt, a bootstrap particle filter for Python

    Date: 27 October 2021

    "},{"location":"case-studies/moss-pypfilt-earlier-states/#overview","title":"Overview","text":"

    I introduced a bug when I modified a function in my pypfilt package, and only detected the bug after I had created several more commits.

    To resolve this bug, I had to:

    1. Notice the bug;

    2. Identify the cause of the bug;

    3. Write a test case to check whether the bug is present; and

    4. Fix the bug.

    "},{"location":"case-studies/moss-pypfilt-earlier-states/#notice-the-bug","title":"Notice the bug","text":"

    I noticed that a regression test1 was failing: re-running a set of model simulations was no longer generating the same output. The results had changed, but none of my recent commits should have had this effect.

    I should have noticed this when I created the commit that introduced this bug, but:

    • I had not pushed the most recent commits to the upstream repository, where all of the test cases are run automatically every time a new commit is pushed; and

    • I had not run the test cases on my laptop after making each of the recent commits, because this takes a few minutes and I was lazy.

    "},{"location":"case-studies/moss-pypfilt-earlier-states/#identify-the-cause-of-the-bug","title":"Identify the cause of the bug","text":"

    I knew that the bug had been introduced quite recently, and I knew that it affected a specific function: earlier_states(). Running git blame src/pypfilt/state.py indicated that the recent commit 408b5f1 was a likely culprit, because it changed many lines in this function.

    In particular, I suspected the bug was occurring in the following loop, which steps backwards in time and handles the case where model simulations are reordered:

    # Start with the parent indices for the current particles, which allow us\n# to look back one time-step.\nparent_ixs = np.copy(hist['prev_ix'][ix])\n\n# Continue looking back one time-step, and only update the parent indices\n# at time-step T if the particles were resampled at time-step T+1.\nfor i in range(1, steps):\n    step_ix = ix - i\n    if hist['resampled'][step_ix + 1, 0]:\n        parent_ixs = hist['prev_ix'][step_ix, parent_ixs]\n

    In stepping through this code, I identified that the following line was incorrect:

        if hist['resampled'][step_ix + 1, 0]:\n

    and that changing step_ix + 1 to step_ix should fix the bug.

    Note: I could have used git bisect to identify the commit that introduced this bug, but running all of the test cases for each commit is relatively time-consuming; since I knew that the bug had been introduced quite recently, I chose to use git blame.

    "},{"location":"case-studies/moss-pypfilt-earlier-states/#write-a-test-case","title":"Write a test case","text":"

    I wrote a test case test_earlier_state() that called this earlier_states() function a number of times, and checked that each set of model simulations were returned in the correct order.

    This test case checks that:

    1. If the model simulations were not reordered, the original ordering is always returned;

    2. If the model simulations were reordered at some time t_0, the original ordering is returned for times t < t_0; and

    3. If the model simulations were reordered at some time t_0, the new ordering is returned for times t >= t_0.

    This test case failed when I reran the testing pipeline, which indicated that it identified the bug.

    "},{"location":"case-studies/moss-pypfilt-earlier-states/#fix-the-bug","title":"Fix the bug","text":"

    With the test case now written, I was able to verify that that changing step_ix + 1 to step_ix did fix the bug.

    I added the test case and the bug fix in commit 9dcf621.

    In the commit message I indicated:

    1. Where the bug was located: the earlier_states() function;

    2. When the bug was introduced: commit 408b5f1; and

    3. Why the bug was not detected when I created commit 408b5f1.

    1. A regression test checks that a commit hasn't changed an existing behaviour or functionality.\u00a0\u21a9

    "},{"location":"collaborating/","title":"Collaborating","text":"

    This section demonstrates how to use Git for collaborative research, enabling multiple people to work on the same code or paper in parallel. This includes deciding how to structure your repository, how to use branches for each collaborator, and how to use tags to track your progress.

    Info

    We also show how these skills support peer code review, so that you can share knowledge with, and learn from, your colleagues as part of your regular activity.

    "},{"location":"collaborating/an-example-pull-request/","title":"An example pull request","text":"

    The initial draft of each chapter in this section were proposed in a pull request.

    When this pull request was created, the branch added four new commits:

    85594bf Add some guidelines for collaboration workflows\n678499b Discuss coding style guides\n2b9ff70 Discuss merge/pull requests and peer code review\n6cc6f54 Discuss repository structure and licenses\n

    and the author (Rob Moss) asked the reviewer (Eamon Conway) to address several details in particular.

    Eamon made several suggestions in their initial response, including:

    • Moving the How to structure a repository and Choosing a license chapters to the Effective use of git section;

    • Starting this section with the Collaborating on code chapter; and

    • Agreeing that we should use this pull request as an example in this book.

    In response, Rob pushed two commits that addressed the first two points above:

    e1d1dd9 Move collaboration guidelines to the start\n3f78ef8 Move the repository structure and license chapters\n

    and then wrote this chapter to show how we used a pull request to draft this book section.

    "},{"location":"collaborating/coding-style-guides/","title":"Coding style guides","text":"

    A style guide defines rules and guidelines for how to structure and format your code. This can make code easier to write, because you don't need to worry about how to format your code. It can also make code easier to read, because consistent styling allows you to focus on the content.

    There are two types of tools that can help you use a style guide:

    • A formatter formats your code to make it consistent with a specific style; and

    • A linter checks whether your code is consistent with a specific style.

    "},{"location":"collaborating/coding-style-guides/#recommended-style-guides","title":"Recommended style guides","text":"

    Because programming languages can be very different from each other, style guides are usually defined for a single programming language.

    Here we list some of the most widely-used style guides for several common programming languages:

    • For R there is a tidyverse style guide.
    • You can apply this style to your code with styler.
    • You can check that your code conforms to this style with lintr.

    • For Python there is Black, which defines a coding style and applies this style to your code.

    • For C++ there is a Google C++ style guide.

    "},{"location":"collaborating/collaborating-on-a-paper/","title":"Collaborating on a paper","text":"

    Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write papers collaboratively as a team. This approach is particularly useful if you are writing a paper in LaTeX.

    Here are some general guidelines that you may find useful:

    • Divide the paper into separate LaTeX files for each section.

    • Use tags to identify milestones such as draft versions and revisions.

    • Consider creating a separate branch for each collaborator.

    • Merge these branches when completing a major draft or revision.

    • Use latexdiff to show tracked changes between the current version and a previous commit/tag:

    latexdiff-git --flatten -r tag-name paper.tex\n
    • Collaborators who will provide feedback, rather than contributing directly to the writing process, can do this by:

    • Annotating PDF versions of the paper; or

    • Providing comments in response to a merge/pull request.
    "},{"location":"collaborating/collaborating-on-code/","title":"Collaborating on code","text":"

    Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write code collaboratively as a team.

    The precise workflow will depend on the nature of your research and on the collaborators in your team, but there are some general guidelines that you may find helpful:

    • Agree on a style guide.

    • Work on separate features in separate branches.

    • Use peer code review before merging changes from these branches.

    • Consider using continuous integration to:

    • Run test cases and detect bugs as early as possible; and

    • Verify that code meets your chosen style guide.
    "},{"location":"collaborating/continuous-integration/","title":"Continuous integration","text":"

    Continuous Integration (CI) is an automated process where code changes are merged in a central repository in order to run automated tests and other processes. This can provide rapid feedback while you develop your code and collaborate with others, as long as commits are regularly pushed to the central repository.

    Info

    This book is an example of Continuous Integration: every time a commit is pushed to the central repository, the online book is automatically updated.

    Because the central repository is hosted on GitHub, we use GitHub Actions. Note that this is a GitHub-specific CI system. You can view the update action for this book here.

    We also use CI to publish each pull request, so that contributions can be previewed during the review process. We added this feature in this pull request.

    "},{"location":"collaborating/merge-pull-requests/","title":"Merge/Pull requests","text":"

    Recall that incorporating the changes from one branch into another branch is referred to as a \"merge\". You can merge one branch into another branch by taking the following steps:

    1. Checking out the branch you want to merge the changes into:

      git checkout -b my-branch\n
    2. Merging the changes from the other branch:

      git merge other-branch\n

    Tip

    It's a good idea to review these changes before you merge them.

    If possible, it's even better to have someone else review the changes.

    You can use git diff to view differences between branches. However, platforms such as GitHub and GitLab offer an easier approach: \"pull requests\" (also called \"merge requests\").

    "},{"location":"collaborating/merge-pull-requests/#creating-a-pull-request-on-github","title":"Creating a pull request on GitHub","text":"

    The steps required to create a pull request differ depending on which platform you are using. Here, we will describe how to create a pull request on GitHub. For further details, see the GitHub documentation.

    • Open the main page of your GitHub repository.

    • In the \"Branch\" menu, select the branch that contains the changes you want to merge.

    • Open the \"Contribute\" menu. This should be located on the right-hand side, above the list of files.

    • Click the \"Open pull request\" button.

    • In the \"base\" menu, select the branch you want to merge the changes into.

    • Enter a descriptive title for the pull request.

    • In the message editor, write a summary of the changes in this branch, and identify specific questions or objectives that you want the reviewer to address.

    • Select potential reviewers by clicking on the \"Reviewers\" link in the right-hand sidebar.

    • Click the \"Create pull request\" button.

    Once the pull request has been created, the reviewer(s) can review your changes and discuss their feedback and suggestions with you.

    "},{"location":"collaborating/merge-pull-requests/#merging-a-pull-request-on-github","title":"Merging a pull request on GitHub","text":"

    When the pull request has been reviewed to your satisfaction, you can merge these changes by clicking the \"Merge pull request\" button.

    Info

    If the pull request has merge conflicts (e.g., if the branch you're merging into contains new commits), you will need to resolve these conflicts.

    For further details about merging pull requests on GitHub, see the GitHub documentation.

    "},{"location":"collaborating/peer-code-review/","title":"Peer code review","text":"

    Once you're comfortable in using merge/pull requests to review changes in a branch, you can use this approach for peer code review.

    Info

    Remember that code review is a discussion and critique of a person's work. The code author will naturally feel that they own the code, and the reviewer needs to respect this.

    For further advice and suggestions on how to conduct peer code review, please see the Performing peer code review references.

    Tip

    Mention people who have reviewed your code in the acknowledgements section of the paper.

    "},{"location":"collaborating/peer-code-review/#define-the-goals-of-a-peer-review","title":"Define the goals of a peer review","text":"

    In creating a pull request and inviting someone to review your work, the pull request description should include the following details:

    • An overview of the work included in the pull request: what have you done, why have you done it?

    • You may also want to explain how this work fits into the broader context of your research project.

    • Identify specific questions or tasks that you would like the reviewer to address. For example, you might ask the reviewer to address one or more of the following questions:

    • Can the reviewer run your code and reproduce the outputs?

    • Is the code easy to understand?

    • If you have a style guide, is the code formatted appropriately?

    • Do the model equation or data analysis steps seem sensible?

    • If you have written documentation, is it easy to understand?

    • Can the reviewer suggest how to improve or rewrite a specific piece of code?

    Tip

    Make the reviewer's job easier by giving them small amounts of code to review.

    "},{"location":"collaborating/peer-code-review/#finding-a-reviewer","title":"Finding a reviewer","text":"

    On GitHub we have started a peer-review team. We encourage you to post on the discussion board, to find like-minded members to review your code.

    "},{"location":"collaborating/peer-code-review/#guidelines-for-reviewing-other-peoples-code","title":"Guidelines for reviewing other people's code","text":"

    Peer code review is an opportunity for the author and the reviewer to learn from each other and improve a piece of code.

    Tip

    The most important guideline for the reviewer is to be kind.

    Treat other people's code the way you would want them to treat your code.

    • Avoid saying \"you\". Instead, say \"we\" or make the code the subject of the sentence.

    • Don't say \"You don't have a test for this function\", but instead say \"We should test this function\".

    • Don't say \"Why did you write it this way?\", but instead say \"What are the advantages of this approach?\".

    • Ask questions rather than stating criticisms.

    • Don't say \"This code is unclear\", but instead say \"Can you help me understand how this code works?\".

    • Treat peer review as an opportunity to praise good work!

    • Don't be afraid to tell the author that a piece of code was very clear, easy to understand, or well written.

    • Tell the author if reading their code made you aware of a useful function or package.

    • Tell the author if reading their code gave you some ideas for your own code.

    "},{"location":"collaborating/peer-code-review/#complete-the-review","title":"Complete the review","text":"

    Once the peer code review is complete, and any corresponding updates to the code have been made, you can merge the branch.

    "},{"location":"collaborating/peer-code-review/#retain-a-record-of-the-review","title":"Retain a record of the review","text":"

    By using merge/pull requests to review code, the discussion between the author and the reviewer is recorded. This can be a useful reference for future code reviews.

    Tip

    Try to record all of the discussion in the pull request comments, even if the author and reviewer meet in person, so that you have a complete record of the review.

    "},{"location":"collaborating/sharing-a-branch/","title":"Sharing a branch","text":"

    You might want a collaborator to work on a specific branch of your repository, so that you can keep their changes separate from your own work. Remember that you can merge commits from their branch into your own branches at any time.

    Info

    You need to ensure that your collaborator has access to the remote repository.

    1. Create a new branch for the collaborator, and give it a descriptive name.

      git checkout -b collab/jamie\n

      In this example we created a branch called \"collab/jamie\", where \"collab\" is a prefix used to identify branches intended for collaborators, and the collaborator is called Jamie.

      Remember that you can choose your own naming conventions.

    2. Push this branch to your remote repository:

      git push -u origin\n
    3. Your collaborator can then make a local copy of this branch:

      git clone --single-branch --branch collab/jamie repository-url\n
    4. They can then create commits and push them to your remote repository with git push.

    "},{"location":"collaborating/sharing-a-repository/","title":"Sharing a repository","text":"

    The easiest way to share a repository with collaborators is to have a single remote repository that all collaborators can access. This repository could be located on a platform such as GitHub, GitLab, or Bitbucket, or on a platform provided by your University or Institute.

    Theses platforms allow you to create public repositories and private repositories.

    • Everybody can view the contents of a public repository.

    • You control who can view the contents of a private repository.

    • For both types of repository, you control who can make changes to the repository, such as creating commits and branches.

    Info

    You should decide whether a public repository or a private repository suits you best.

    "},{"location":"collaborating/sharing-a-repository/#giving-collaborators-access-to-your-remote-repository","title":"Giving collaborators access to your remote repository","text":"

    The steps required to do this differ depending on which platform you are using. Here, we will describe how to give collaborators access to a repository on GitHub. For further details, see the GitHub documentation.

    • Open the main page of your GitHub repository.

    • Click on the \"Settings\" tab in the top navigation bar.

    • Click on the \"Collaborators\" item in the left sidebar.

    • Click on the \"Add people\" button.

    • Search for collaborators by entering their GitHub user name, their full name, or their email address.

    • Click the \"Add to this repository\" button.

      This will send an invitation to the collaborator. If they accept this invitation, they will have access to your repository.

      "},{"location":"community/","title":"Community of Practice","text":"

      Info

      Communities of Practice are groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.

      The community acts as a living curriculum and involves learning on the part of everyone.

      The aim of a Community of Practice (CoP) is to come together as a community and engage in a process of collective learning in a shared domain. The three characteristics of a CoP are:

      1. Community: An environment for learning through interaction;

      2. Practice: Specific knowledge shared by community members; and

      3. Domain: A shared interest, problem, or concern.

      We meet as a community every 6 to 8 weeks, and capture observations in meeting summaries.

      "},{"location":"community/meetings/","title":"Meetings","text":"

      This section contains summaries of each Community of Practice meeting.

      • 17 April 2023: our initial meeting.

      • 13 June 2023: exploration of version control, reproducibility, and testing exercises.

      • 15 August 2023: changing our research and reproducibility practices.

      • 18 October 2023: sharing experiences about good ways to structure a project.

      "},{"location":"community/meetings/2023-04-17/","title":"17 April 2023","text":"

      This is our initial meeting. The goal is to welcome people to the community and outline how we envision running these Community of Practice meetings.

      "},{"location":"community/meetings/2023-04-17/#theme-reproducible-research","title":"Theme: Reproducible Research","text":"

      Outline the theme and scope for this community.

      This is open to all researchers who share an interest in reproducible research and/or related topics and practices; no prior knowledge is required.

      For example, consider these questions:

      • Can you reproduce your current results on a new computer?

      • Can someone else reproduce your current results?

      • Can someone else reproduce your current results without your help?

      • Can you reproduce your own results from, say, 2 years ago?

      • Can someone else reproduce your own results from, say, 2 years ago?

      • Can you fix a mistake and update your own results from, say, 2 years ago?

      Tip

      The biggest challenge can often be remembering what you did and how you did it.

      Making small changes to your practices can greatly improve reproducibilty!

      "},{"location":"community/meetings/2023-04-17/#how-will-these-meetings-run","title":"How will these meetings run?","text":"
      • Aim to hold these meetings on a (roughly) monthly basis.

      • Prior to each meeting, we will invite community members to propose a topic or discussion point to be the focus of the meeting. This may be an open question or challenge, an example of good research practices, a useful software tool, etc.

      • Schedule each meeting to best suit the availability of community members who are particularly interested in the chosen topic.

      • Each meeting should be hosted by one or more community members, with online participation available to those who cannot attend in person.

      • At the end of each meeting, we will ask attendees how useful/effective they found the meeting, so that we can better cater to the needs of the community. For example:

      • What do you think of the session?

      • What did we do well?
      • What could we do better in the next session?

      • We will summarise the key observations, findings, and outputs of each meeting in our online materials, and use them to improve and grow our training materials.

      "},{"location":"community/meetings/2023-04-17/#preferred-communication-channels","title":"Preferred communication channels?","text":"

      Info

      To function effectively as a community, we need to support asynchronous discussions in addition to scheduled meetings.

      One option is a dedicated mailing list. Other options were suggested:

      • A Slack workspace (Dave);

      • A Discord channel (TK);

      • A Teams channel (Gerry); and

      • A private GitHub repository, using the issue tracker (Alex).

      Using a GitHub issue tracker might also serve as a gentle introduction to GitHub?

      "},{"location":"community/meetings/2023-04-17/#supporting-activities-and-resources","title":"Supporting activities and resources?","text":"

      Are there other activities that we could organise to help support the community?

      • We have online training materials, Git is my lab book, which should be useful for people who are not familiar with version control.

      • We also have a SPECTRUM/SPARK peer review team, where people can make their code available for peer review.

      "},{"location":"community/meetings/2023-04-17/#topics-for-future-meetings","title":"Topics for future meetings?","text":"

      We asked each participant to suggest topics that they would like to see covered in future meetings and/or activities. A number of common themes emerged.

      "},{"location":"community/meetings/2023-04-17/#version-control-from-theory-to-practice","title":"Version control: from theory to practice","text":"

      A number of people mentioned now being sure how to get started, or starting with good intentions but ending up with a mess.

      • Dave: how can I transition from principle to practice?

      • Ollie: similar to David, I often start well but end up with a mess.

      • Ruarai: what other have found useful and applied in this space, what options are out there?

      • Michael: I'm a complete novice, git command lines are a foreign language to me! I'm looking for tips for someone who uses code a lot, experienced at coding but much less so on version control and the use of repositories. What are the first steps to incorporate it into my workflow?

      • Angus: I'm also relatively new to Git and have been using GitHub Desktop (a GUI for Windows and Mac). I'm not averse to command line stuff but I need to remember fewer arcane commands!

      • Samik: I use TortoiseGit \u2014 a Windows Shell Interface to Git.

      • Gray: I resonate with Michael, I do most of my research on my own and describe it in papers. It isn't particularly Git-friendly, I'm keen to learn.

      • Lauren: everything that everyone has said so far! I've found some good guidelines for how to write reproducible code, starting from the basics all the way to niche topics. Can we use this as a way to share materials that we've sound useful? The British Ecological Society have published guidelines. We could assemble good materials that start from basics.

      • David: The Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE) also have good materials.

      • Gerry: I like the idea of reproducibility and I've done a terrible job of it in the past, my repository ends up with thousands of versions of different files. Can you help me solve it?

      • Josh: Along the same lines of what's been stated. How best to share knowledge of Git and best practices with others in a new research team? How to adjust to their methods of conducting reproducible research, version control, etc?

      • Punya: not much to add, would really like to know more about version control, I have a basic understanding, what's the standard way of using it, reproducibility and documentation.

      • Rachel: I strongly support this idea of code reproducibility. Best practice frameworks can be disseminated to modellers in modelling consortia, and they can be very helpful when auditing.

      • Ella: we're migrating models from Excel to R.

      • J'Belle: I work for a tiny, very remote health service at the Australian and Papua New Guinea border. We have 17 sources of clinical data, which presents massive challenges in reproducibility and quality assurance. I'm looking to tap into your expertise. How do we manage so many sources of clinical data?

      "},{"location":"community/meetings/2023-04-17/#working-with-less-technically-experienced-collaborators","title":"Working with less technically-experienced collaborators","text":"

      How can we make best use of existing tools and practices, while working with collaborators who have less technical expertise/experience?

      • Alex: if I start a project with collaborators who may be less technically literate, how can they contribute without messing up reproducibility? Options like Docker are a little too complicated. How can I motivate people, is there a simple solution?

      • Angus: in theory you may have reproducible code. But if you need to write a paper with less technical collaborators, running the code and generating reports can be hard. How do we collaborate on the writing side? RMarkdown and equivalents makes a lot of sense, but most colleagues will only look at Word documents. There are some workarounds, such as pandoc.

      "},{"location":"community/meetings/2023-04-17/#reproducibility-best-practices-and-limitations","title":"Reproducibility: best practices and limitations","text":"

      How far can/should we go in validating and demonstrating that our models and analyses are reproducible? How can we automate this? How is this impacted when we cannot share the input data, or when our models are extremely large and complex?

      • Cam: there are unique issues in the type of research we do. Working with code makes it easy in some ways, as opposed to experimental conditions in real-world experiments. Our capacity for reproducibility is great, but so then is our burden. We should be exploring the limitations! Some challenges in our area come down to implementation of stochastic models with lots of random processes. How can we do that well and make it part of what we practice? What are the limitations to reproducibility and how do we perceive the goals when we are working when the data cannot be shared?

      • Samik: similar to Cam, I'm interested in how people have produced reproducible research where the data cannot be shared. Perhaps we can provide default examples as test cases?

      • Michael: I second Cam's points, particularly about reproducibility with confidential data. That's an issue I've hit multiple times. We usually have a side folder with the real dataset, and pipe through condensed or aggregated versions of the data that we can release.

      • Jiahao: I'm interested in how to build a platform for using agent based models. I've looked at lots of other models, but how can we bring them together so that it is easier to add a new variable or extend a model?

      • Eamon: I'm a Git fanatic, and I want to understand the development of code that I work with. I get annoyed when people share a repository as a single commit. People who don't use tags in their Git repositories to identify the version of the code they used in, e.g., a paper! How do you start running the code? What file formats does it expect to process?

      • Dion: I'm interested in seeing what people are doing that look like good practice. Making sure that code and results are reproducible, in the sense that your code may be version controlled, but you've since made changes to code, parameters, input data, etc. How do you do a good job to shoe-horn that all into Git? Maybe use Git for development and simultaneously use a separate repository for production stuff? We need to be able to look back and identify from the commit the code, data, intermediate files used along the way.

      • Palang: I've looked at papers with supplementary code, but running the code myself produces very different results from what's in the paper.

      • May: most people have said what I wanted to say. I faced similar problems with running other people's code. It may not print any error message, but you get very different results from what's published in the paper. You don't know who or what is wrong!

      "},{"location":"community/meetings/2023-04-17/#testing-and-documentation","title":"Testing and documentation","text":"

      How can we develop confidence in our own code, and in other people's code?

      • TK: I want to learn/see different conventions for writing code documentation. I've never managed to get doxygen working to my satisfaction.

      • Angus: how do we design good tests? How to test, when to test, what to test for? Should we have coverage targets? Are there ways to automated testing?

      • Rahmat: I often find it very hard to learn how to use other people's code. The code needs to be easy to understand. Otherwise, I will just write the code myself! Sometimes when I run the code, I have difficulties in generating results, many errors come up and it's not clear why. Perhaps all of the necessary data have not been shared with the code? We need to include the data, but if the data cannot be provided, you need to provide similar data so that other can run the code. It also helps to use a language that others are familiar with.

      "},{"location":"community/meetings/2023-04-17/#code-reuse","title":"Code reuse","text":"
      • Pan: I am not sure about the term reproducibility in the context of coders. I know lab people really do reuse published protocols. But do coders actually reuse other people's code to do their work?

      • Gerry: People often make their code into packages which others reuse. This could be a good topic for future meetings.

      "},{"location":"community/meetings/2023-04-17/#using-chat-gpt-to-writecheck-code","title":"Using Chat GPT to write/check code","text":"
      • Pan: I recently joined a meeting where people have used Chat GPT to check their code. Does this group have any thoughts on how we might make good use of Chat GPT?

      • Cam: Chat GPT is not reproducible itself, so it seems questionable to use it to check reproducibility.

      • Alex: I don't entirely agree, it can be very useful for improving the implementation of a function. In terms of generating reliable code, it's wonderful. It's a nightmare for evaluating existing code.

      • Pan: people are using Chat GPT to generate initial templates.

      • Eamon: If you encounter code that has poor documentation, Chat GPT is surprisingly good at telling you how to use it.

      • Matt: I don't have anything to add to the above, I'm happy to be along for the ride.

      "},{"location":"community/meetings/2023-06-13/","title":"13 June 2023","text":"

      In this meeting we asked participants to share their experiences exploring the version control, reproducibility, and testing exercises in our example repository.

      This repository serves an introduction to testing models and ensuring that their outputs are reproducible. It contains a simple stochastic model that draws samples from a normal distribution, and some tests that check whether the model outputs are consistent with our expectations.

      "},{"location":"community/meetings/2023-06-13/#what-is-a-reproducible-environment","title":"What is a reproducible environment?","text":"

      The exercise description was deliberately very open, but it may have been too vague:

      Define a reproducible environment in which the model can run.

      We avoided listing possible details for people to consider, such as software and package versions. Perhaps a better approach would have been to ask:

      If this code was provided as supporting materials for a paper, what other information would you need in order to run it and be confident of obtaining the same results as the original authors?

      The purpose of a reproducible environment is to define all of these details, so that you never have to say to someone \"well, it runs fine on my machine\".

      "},{"location":"community/meetings/2023-06-13/#reproducibility-and-stochasticity","title":"Reproducibility and stochasticity","text":"

      Many participants observed that the model was not reproducible unless we used a random number generator (RNG) with a known seed, which would ensure that the model produces the same output each time we run it.

      But what if you're using a package or library that internally uses their own RNG and/or seed? This may not be something you can fix, but you should be able to detect it by running the model multiple times with the same seed, and checking whether you get identical result each time.

      Another important question was raised: do you, or should you, include the RNG seed in your published code? This is probably a good idea, and suggested solutions included setting the seed at the very start of your code (so that it's immediately visible) or including it as a required model parameter.

      "},{"location":"community/meetings/2023-06-13/#writing-test-cases","title":"Writing test cases","text":"

      Tip

      Write a test case every time you find a bug: ensure that the test case finds the bug, then fix the bug, then ensure that the test case passes.

      A test case is a piece of code that checks that something behaves as expected. This can be as simple as checking that a mathematical function returns an expected value, to running many model simulations and verifying that a summary statistic falls within an expected range.

      Rather than trying to write a single test that checks many different properties of a piece of code, it can be much simpler and quicker to write many separate tests that each check a single property. This can provide more detailed feedback when one or more test cases fail.

      Note

      This approach is similar to how we rely on multiple public health interventions to protect against disease outbreaks! Consider each test case as a slice of Swiss cheese \u2014 many imperfect tests can provide a high degree of confidence in our code.

      "},{"location":"community/meetings/2023-06-13/#writing-test-cases-for-conditions-that-may-fail","title":"Writing test cases for conditions that may fail","text":"

      If you are testing a stochastic model, you may find certain test cases are difficult to write.

      For example, consider a stochastic SIR model where you want to test that an intervention reduces the number of cases in an outbreak. You may, however, observe that in a small proportion of simulations the intervention has no effect (or it may even increase the number of cases).

      One approach is to run many pairs of simulations and only check that the intervention reduced the number of cases at least X% of the time. You need to decide how many simulations to run, and what is an appropriate value for X%, but that's okay! Remember the Swiss cheese analogy, mentioned above.

      "},{"location":"community/meetings/2023-06-13/#testing-frameworks","title":"Testing frameworks","text":"

      If you have more than 2 or 3 test cases, it's a good idea to use a testing framework to automatically find your test cases, run each test, record whether it passed or failed, and report the results. These frameworks are usually specific to a single programming language.

      Some commonly-used frameworks include:

      • Python: pytest
      • R: testthat
      • Matlab: included in the language
      • Julia: included in the language
      "},{"location":"community/meetings/2023-06-13/#github-actions","title":"GitHub Actions","text":"

      Multiple participants reported some difficulties in setting up GitHub actions and knowing how to adapt available templates to their needs. See the following examples:

      • Python starter workflow; and
      • GitHub Actions for R.

      We will aim to provide a GitHub action workflow for each model, and add comments to explain how to adapt these templates.

      Warning

      One downside of using GitHub Actions is the limited computation time of 2,000 minutes per month. This may not be suitable for large agent-based models and other long-running tasks.

      "},{"location":"community/meetings/2023-06-13/#pull-requests","title":"Pull requests","text":"

      At the time of writing, three participants have contributed pull requests:

      • TK added a default seed so that the model outputs are reproducible.

      • Micheal added a MATLAB version of the model and the test cases.

      • Cam added several features, such as recording metadata about the Python environment and testing that the model outputs are reproducible.

      Tip

      If you make your own copy (\"fork\") of the example repository, you can create as many commits as you want. GitHub will display a message that says:

      This branch is N commits ahead of rrcop:master.

      Click on the \"N commits ahead\" link to see a summary of your new commits. You can then click the big green button \"Create pull request\".

      This will not modify the example repository. Instead, it will create an overview of the changes between your code and the example repository. We can then review these changes, make suggestions, you can add new commits, etc, before deciding whether to add these changes to the example repository.

      "},{"location":"community/meetings/2023-08-15/","title":"15 August 2023","text":"

      Info

      See the Resources section for links to useful resources that were mentioned in this meeting.

      "},{"location":"community/meetings/2023-08-15/#changes-to-practices","title":"Changes to practices","text":"

      In this meeting we asked everyone what changes (if any) they have made to their research and reproducibility practices since our last meeting.

      A common theme was improving how we note and record our past actions. For example:

      • Eamon has begun recording the commit ID (\"hash\") of the code that was used to generate each set of outputs. This allows him to easily retrieve the exact version of the code that was used to generate any past result and, e.g., generate other outputs of interest.

      • Pan talked about how their group records raw separately from, but grouped with, the analysis code and processed data that were generated from these raw data. They also record every step of their model-fitting process, which may not always go as smoothly as expected.

      This ensures that stakeholders who want to use these models to run their own scenarios can reproduce the baseline scenarios without being modelling experts themselves.

      The model is available as an online app.

      • Rob has begun working on an existing malaria model, which was implemented in R as a series of scripts that shared many global variables. He wanted to restructure code to better understand it, so he used version control to record the simulation outputs and ensure that he didn't change the model's behaviour as he restructured the code. On several occasions he modified parts of the code and discovered that these changes unexpectedly affected the simulation outputs. This is a manual equivalent of using continuous integration.
      "},{"location":"community/meetings/2023-08-15/#how-do-you-structure-a-project","title":"How do you structure a project?","text":"

      Gizem asked the group \"How do you choose an appropriate project structure, especially if the project changes over time?\"

      Phrutsamon: the TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data.

      Rob: there may not be a single perfect solution that addresses everyone's needs. But look back at past projects, and try to imagine how the current project might change in the future. And if you're using version control, don't be afraid to experiment with different project structures \u2014 you can always revert back to an earlier commit.

      "},{"location":"community/meetings/2023-08-15/#reviewing-code-as-part-of-manuscript-peer-review","title":"Reviewing code as part of (manuscript) peer review","text":"

      Rob asked the group \"Has anyone reviewed supporting code when reviewing a manuscript?\"

      • Ruarai read through R code that was provided with a paper, but was unable to run all of it \u2014 some of the code produced errors.

      • Similarly, Rob has read R code provided with a paper that used hard-coded paths that did not exist (e.g., \"C:\\Users\\<Author Name>\\...\"), tried to run code in source files that did not exist, and read data from CSV files that did not exist.

      Info

      Pan mentioned a fantastic exercise for research students.

      Pick a modelling paper that is relevant to their research project, and ask the student to:

      1. read it;
      2. understand it; and
      3. reproduce the figures.

      This teaches the students that reproducibility is very important, and shows them what they need to do when they publish their own results.

      It's important to pick a relatively simple paper, so that this task isn't too complicated for the student. And if the paper is written by a colleague or collaborator, you can contact them to ask for extra details, etc.

      "},{"location":"community/meetings/2023-08-15/#using-shiny-to-make-models-availablereproducible","title":"Using Shiny to make models available/reproducible","text":"

      Pan asked the group \"What do you think about (the extra work involved in) turning R code into Shiny applications, to show that the model is reproducible, and do so in a way that lets others easily make use it?\"

      An objective of the COVID-19 International Modelling Consortium (CoMo) is to make models available and usable for non-modellers \u2014 turning models into something that anyone with minimal knowledge can explore.

      The model is available as a Shiny app, and is continually being updated and refined. It is currently at version 19! Pan's group is trying to ensure that existing users update to the most recent version, because it can be very challenging and time-consuming to create scenario templates for older model versions. Templates are a good way to help the user define their scenario-specific settings, but it's a nightmare when you change the model version \u2014 it's like working with a new model.

      • Eamon: this is similar to when software platforms make changes to their APIs. Can you make backwards-compatible changes, or automatically transform old templates to make them compatible with the latest model version? This kind of work is simple to fund when your software is a commercial product, but it's much harder to find funding for academic outputs.

      • Pan: It's a lot of extra work, without any money to support it. For this consortium we hired several programmers, some for the coding, some specifically for the Shiny app, it involved a lot of resources. That project has now ended, but we've learned a lot and have a good network of collaborators. We still have monthly meetings! This was a special case with COVID-19, because the context changed so quickly. It would be much less of a problem with other diseases, which we better understood.

      • Gizem: very much in favour of using Shiny to make models available, and recently made a Shiny app for their latest project (currently under review). Because the model is very complicated, we had to pre-calculate model results for specific parameter combinations, and only allow users to choose between these parameter combinations. One reviewer asked for a modified figure to show results for slightly different parameter values, and it was quite simple to address.

      Hadley Wickham has written a very good book about developing R Shiny applications. Gizem read a chapter of this book each morning, but found it necessary to practice in order to really understand how to use Shiny.

      Info

      Learning by doing (experiential learning) is a highly-effective way of convincing people to change their practices. It can be greatly enhanced by engaging as a community.

      "},{"location":"community/meetings/2023-08-15/#resources","title":"Resources","text":""},{"location":"community/meetings/2023-08-15/#teaching-reproducibility-and-responsible-workflows","title":"Teaching reproducibility and responsible workflows","text":"

      The Journal of Statistics and Data Science Education published a special issue: Teaching Reproducibility in November 2022. The accompanying editorial article highlights:

      Integrating reproducibility into our practice and our teaching can seem intimidating initially. One way forward is to start small. Make one small change to add an element of exposing students to reproducibility in one class, then make another the next semester. Our students can get much of the benefit of reproducible and responsible workflows even if we just make a few small changes in our teaching. These efforts will help them to make more trustworthy insights from data. If it leads, by way of some virtuous cycle, to us improving our own practice, then even better! Improving our teaching through providing curricular guidance about reproducible science will take time and effort that should pay off in the long term.

      This journal issue was followed by an invited paper session with the following presentations:

      • Collaborative writing workflows: building blocks towards reproducibility

      • Opinionated practices for teaching reproducibility: motivation, guided instruction, and practice

      • From teaching to practice: Insights from the Toronto Reproducibility Conferences

      • Teaching reproducibility and responsible workflow: an editor's perspective

      "},{"location":"community/meetings/2023-08-15/#project-templates","title":"Project templates","text":"
      • The TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data:

      Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.

      "},{"location":"community/meetings/2023-08-15/#using-shiny","title":"Using Shiny","text":"
      • Mastering Shiny: an online book that teaches how to create web applications with R and Shiny.

      • CoMo Consortium App: the COVID-19 International Modelling Consortium (CoMo) has developed web application for an age-structured, compartmental SEIRS model.

      "},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-r","title":"Continuous integration examples for R","text":"
      • Building reproducible analytical pipelines with R: this article shows how to use GitHub Actions to run R code when you push new commits to a GitHub repository.

      • GitHub Actions for the R language: this repository provides a variety of GitHub actions for R projects, such as installing specific versions of R and R packages.

      "},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-python","title":"Continuous integration examples for Python","text":"
      • GitHub Actions for Python: the GitHub Actions documentation includes examples of building and testing Python projects.
      "},{"location":"community/meetings/2023-08-15/#other-continuous-integration-examples","title":"Other continuous integration examples","text":"

      See the GitHub actions for Git is my lab book, available here. For example, the build action performs the following actions:

      1. Check out the repository, using actions/checkout;

      2. Install mdBook and other required tools, using make.

      3. Build a HTML version of the book, using mdBook.

      "},{"location":"community/meetings/2023-10-18/","title":"18 October 2023","text":"

      In this meeting we asked participants to share their experiences about good (and bad) ways to structure a project.

      Info

      We are currently drafting Project structure and Writing code guidelines.

      See the pull request for further details. Please contribute suggestions!

      We had six in-person and eight online attendees. Everyone predominantly uses one or more of the following languages:

      • Matlab;
      • Python; and
      • R.
      "},{"location":"community/meetings/2023-10-18/#naming-files","title":"Naming files","text":"

      The tidyverse style guide includes recommendations for naming files. One interesting recommendation in this guide is:

      • If files should be run in a particular order, prefix each file name with a number. For example:

        00_download.R 01_clean.R 02_summarise.R ... 09_plot_figures.R 10_generate_tables.R

      "},{"location":"community/meetings/2023-10-18/#choosing-a-directory-structure","title":"Choosing a directory structure","text":"

      A common starting point is often one or more scripts in the root directory. But we can usually divide a project into several distinct steps or stages, and store the files necessary for each stage in a separate sub-directory.

      Tip

      Your project structure may change as the project develops. That's okay!

      You might, e.g., realise that some files should be moved to a new, or different, sub-directory.

      Packaging: Python and R allow you to bundle multiple code files into a \"package\". This makes it easier to use code that is split up into multiple files. It also makes it simpler to test and verify whether your code can be run on a different computer. To create a package, you need to provide some metadata, including a list of dependencies (packages or libraries that your code needs in order to run). When installing a Python or R package, it will automatically install the necessary dependencies too. You test this out on, e.g., a virtual machine to verify that you've correctly listed all of the necessary dependencies.

      Version control: the history may be extremely useful for you, but may contain things you don't want to make publicly available. One solution would be to know from the very start what files you will want to make available and what files you do not (e.g., sensitive data files), but this is not always possible. Another, more realistic, solution is to create a new repository, copy over all of the files that you want to make available, and record these files in a single commit. The public repository will not share the history of your project repository, and that's okay \u2014 the public repository's primary purpose is to serve as a snapshot, rather than a complete and unedited history.

      "},{"location":"community/meetings/2023-10-18/#locating-files","title":"Locating files","text":"

      A common concern how to locate files in different sub-directories (e.g., loading code, reading data files, writing output files) without relying on using absolute paths. For loading code, Python and Matlab allow the user to add directories to the search path (e.g., by modifying sys.path in Python, or calling addpath() in Matlab). But these are not ideal solutions.

      • As a general rule, prefer using relative paths instead of absolute paths.

      • Relative paths are defined relative to the current working directory. For example: sub-directory/file-name and ../other-directory.

      • Absolute paths are defined relative to the root drive or directory. For example: /Users/My Name/... and C:\\Users\\My Name\\....

      Absolute paths may not exist on other people's computers.

      • For R, the here package allows you to construct file paths relative to the top-level project directory. For example, if you have a data file in project/input-data/file-1.csv and a script file in project/analysis-code/read-input-data.R, you can locate the data file from within the script with the following code:
      library(here)\ndata_file <- here(\"input-data/file-1.csv\")\n

      Tip

      A general solution for any programming language is to break your code into functions, each of which accepts input and/or output file names as arguments (when required). This means that most of your code is entirely independent of your chosen project structure. You can then store/generate all of the file paths in a single file, or in each of your top-level scripts.

      "},{"location":"community/meetings/2023-10-18/#peer-review-get-feedback-on-project-structure","title":"Peer review: get feedback on project structure","text":"

      It can be helpful to get feedback from someone who isn't directly involved in the project. They may view the work from a fresh perspective, and be able to identify aspects that are confusing or unclear.

      When inviting someone to review your work, you should identify specific questions or tasks that you would like the reviewer to address.

      With respect to project structure, you may want to ask the reviewer to address questions such as:

      • Do the project directories suggest a clear structure or workflow?
      • Does each directory contain files that are clearly related to each other?
      • Do the names of each directory and each file seem reasonable?
      • Are there any files that you would consider renaming or moving?
      • Does the README.md file help you to navigate the project?

      You could also ask the reviewer to look at a specific script or code file, and ask questions such as:

      • Should this code be divided into smaller functions?
      • Should this code be divided into multiple files?

      Info

      For further ideas about useful peer review activities, and how to incorporate them into your workflow, see the following paper:

      Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology, Ivimey-Cook et al., Journal of Evolutionary Biology 36(10):1347\u20131356, 2023.

      "},{"location":"community/meetings/2023-10-18/#styling-and-formatting","title":"Styling and formatting","text":"

      We also discussed opinions about how to name functions, variables, files, etc.

      For example, R allows you to use periods (.) in function and variable names, but the tidyverse style guide recommends only using lowercase letters, numbers, and underscores (_).

      If you review other people's code, and have other people review your code, you might be surprised by the different styles and conventions that people use. When reviewing code, these differences can be somewhat distracting.

      • Agreeing on, and adhering to, a common style guide can avoid these issues and allow the reviewer to dedicate their attention to actually reading and reasoning about the code.

      • There are tools to automatically format your code (\"formatters\") and to warn about potential issues, such as unused variables (\"linters\"). Here are some commonly-used formatters and linters for different languages:

      Language Style guide(s) Formatter Linter R tidyverse styler lintr Python PEP 8 / The Hitchhiker's Style Guide black ruff Julia style guide JuliaFormatter.jl Lint.jl"},{"location":"community/meetings/2023-10-18/#ai-tools-for-writing-and-reviewing-code","title":"AI tools for writing and reviewing code","text":"

      There are AI tools that you can use to write, format, and review code, but you will need to check whether the code is correct. For example, GitHub Copilot is a (commercial) tool that accepts natural-language descriptions and generates computer code.

      Tip

      Feel free to use AI tools as a way to get started, but don't simply copy-and-paste the code they give you without reviewing it.

      "},{"location":"high-performance-computing/","title":"Cloud and HPC platforms","text":"

      This section introduces computing platforms that allow you to generate outputs more quickly, and without relying on your own laptop or desktop computer. It also demonstrates how to use version control to ensure that the code running on these platforms is the same as the code on your laptop.

      "},{"location":"reproducibility/","title":"Reproducibility","text":"

      This section demonstrates how to use version control and software testing to ensure that your research results can be independently reproduced by others.

      Tip

      Reproducibility is just as much about simple work habits as the tools used to share data and code.

      \u2014 Jesse M. Alston and Jessica A. Rick

      "},{"location":"testing/","title":"Testing","text":"

      This section introduces the topic of software testing. Testing your code is an important part of any code-based research activity. Tests check whether your code behaves as intended, and can warn you if you introduce a bug or mistake into your code.

      Tip

      Tests can show the presence of bugs, but not their absence.

      \u2014 Edsger W. Dijkstra

      "},{"location":"using-git/","title":"Effective use of git","text":"

      This section shows how to use the git command-line program to record your work, to inspect your commit history, and to search this commit history to identify commits that make specific changes or have specific effects.

      Reminder

      Remember to commit early and commit often. Do not wait until your code is \"perfect\".

      "},{"location":"using-git/choosing-a-license/","title":"Choosing a license","text":"

      A license specifies the conditions under which others may use, modify, and/or distribute your work.

      Info

      Simply making a repository publicly accessible is not sufficient to allow others to make use of your work. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work.

      There are many different types of licenses that you can use, and the number of options can seem overwhelming. But it is usually straightforward to narrow down your options.

      • If you're working on an existing project, the easiest option is to use that project's license.

      • If you're working with an existing community, they may have a preferred license.

      • If you want to choose an open source license, the Choose an open source license website provides advice for selecting a license that meets your needs.

      For further information about the various types of available licenses, and some advice for selecting a suitable license for academic software, see A Quick Guide to Software Licensing for the Scientist-Programmer.

      "},{"location":"using-git/choosing-your-git-editor/","title":"Choosing your Git editor","text":"

      In this video, we show how to use nano and vim for writing commit messages. See below for brief instructions on how to use these editors.

      Tip

      This editor is only used for writing commit messages. It is entirely separate from your choice of editor for any other task, such as writing code.

      Git editor example

      Video timeline:

      1. Overview
      2. Show how to use nano
      3. Show how to use vim

      Note

      You can pause the video to select and copy any of the text, such as the git config --global core.editor commands.

      "},{"location":"using-git/choosing-your-git-editor/#how-to-use-nano","title":"How to use nano","text":"

      Once you have written your commit message, press Ctrl + O and then Enter to save the commit message, then press Ctrl + X to quit the editor.

      To quit without saving press Ctrl + X. If you have made any changes, nano will ask if you want to save them. Press n to quit without saving these changes.

      "},{"location":"using-git/choosing-your-git-editor/#how-to-use-vim","title":"How to use vim","text":"

      You need to press i (switch to insert mode) before you can write your commit message. Once you have written your commit message, press Esc and then type :wq to save your changes and quit the editor.

      To quit without saving press Esc and then type :q!.

      "},{"location":"using-git/cloning-an-existing-repository/","title":"Cloning an existing repository","text":"

      If there is an existing repository that you want to work on, you can \"clone\" the repository and have a local copy. To do this, you need to know the remote repository's URL.

      Tip

      For GitHub repositories, there should be a green button labelled \"Code\". Click on this button, and it will provide you with the URL.

      You can then make a local copy of the repository by running:

      git clone URL\n

      For example, to make a local copy of this book, run the following command:

      git clone https://github.com/robmoss/git-is-my-lab-book.git\n

      This will create a local copy in the directory git-is-my-lab-book.

      Note

      If you have a GitHub account and have set up an SSH key, you can clone GitHub repositories using your SSH key. This will allow you to push commits to the remote repository (if you are permitted to do so) without having to enter your user name and password.

      You can obtain the SSH URL from GitHub by clicking on the green \"Code\" button, and selecting the \"SSH\" tab.

      For example, to make a local copy of this book using SSH, run the following command:

      git clone git@github.com:robmoss/git-is-my-lab-book.git\n
      "},{"location":"using-git/creating-a-commit/","title":"Creating a commit","text":"

      Creating a commit involves two steps:

      1. Identify the changes that should be included in the commit. These changes are then \"staged\" and ready to be included in the next commit.

      2. Create a new commit that records these staged changes. This should be accompanied by a useful commit message.

      We will now show how to perform these steps.

      Note

      At any time, you can see a summary of the changes in your repository, and which ones are staged to be committed, by running:

      git status\n

      This will show you:

      1. The files (if any) that contain changes that have been staged;
      2. The files (if any) that contain changes that have not been staged; and
      3. The files (if any) that are not recorded in the repository history.
      "},{"location":"using-git/creating-a-commit/#adding-a-new-file","title":"Adding a new file","text":"

      If you've created a new file, you can include this file in the next commit by running:

      git add filename\n
      "},{"location":"using-git/creating-a-commit/#adding-all-changes-in-an-existing-file","title":"Adding all changes in an existing file","text":"

      If you've made changes to an existing file, you can include all of these changes in the next commit by running:

      git add filename\n
      "},{"location":"using-git/creating-a-commit/#adding-some-changes-in-an-existing-file","title":"Adding some changes in an existing file","text":"

      If you've made changes to an existing file and only want to include some of these changes in the next commit, you can select the changes to include by running:

      git add -p filename\n

      This will show you each of the changes in turn, and allow you select which ones to stage.

      Tip

      This interactive selection mode is very flexible; you can enter ? at any of the prompts to see the range of available actions.

      "},{"location":"using-git/creating-a-commit/#renaming-a-file","title":"Renaming a file","text":"

      If you want to rename a file, you can use git mv to rename the file and stage this change for inclusion in the next commit:

      git mv filename newname\n
      "},{"location":"using-git/creating-a-commit/#removing-a-file","title":"Removing a file","text":"

      If you want to remove a file, you can use git rm to remove the file and stage this change for inclusion in the next commit:

      git rm filename\n

      Tip

      If the file has any uncommitted changes, git will refuse to remove the file. You can override this behaviour by running:

      git rm --force filename\n
      "},{"location":"using-git/creating-a-commit/#inspecting-the-staged-changes","title":"Inspecting the staged changes","text":"

      To verify that you have staged all of the desired changes, you can view the staged changes by running:

      git diff --cached\n

      You can view the staged changes for a specific file by running:

      git diff --cached filename\n
      "},{"location":"using-git/creating-a-commit/#undoing-a-staged-change","title":"Undoing a staged change","text":"

      You may sometimes stage a change for inclusion in the next commit, but decide later that you don't want to include it in the next commit. You can undo staged changes to a file by running:

      git restore --staged filename\n

      Note

      This will not modify the contents of the file.

      "},{"location":"using-git/creating-a-commit/#creating-a-new-commit","title":"Creating a new commit","text":"

      Once you have staged all of the changes that you want to include in the commit, create the commit by running:

      git commit\n

      This will open your chosen editor and prompt you to write the commit message.

      Tip

      Note that the commit will not be created until you exit the editor.

      If you decide that you don't want to create the commit, you can abort this action by closing your editor without saving a commit message.

      Please see Choosing your Git editor for details.

      "},{"location":"using-git/creating-a-commit/#modifying-the-most-recent-commit","title":"Modifying the most recent commit","text":"

      After you create a commit, you might decide that there are other changes that should be included in the commit. Git provides a simple way of modifying the most recent commit.

      Warning

      Do not modify the commit if you have already pushed it to another repository. Instead, record a new commit that includes the desired changes.

      Remember that your commit history should not be a highly-edited, polished view of your work, but should instead act as a lab book.

      Do not worry about creating \"perfect\" commits!

      To modify the most recent commit, stage the changes that you want to commit (see the sections above) and add them to the most recent commit by running:

      git commit --amend\n

      This will open your chosen editor and allow you to modify the commit message.

      "},{"location":"using-git/creating-a-remote-repository/","title":"Creating a remote repository","text":"

      Once you have created a \"local\" repository (i.e., a repository that exists on your own computer), it is generally a good idea to create a \"remote\" repository. You may choose to store this remote repository on a service such as GitHub, or on a University-provided platform.

      If you are using GitHub, you can choose to create a public repository (viewable by anyone, but you control who can make changes) or a private repository (you control who can view and/or make changes).

      "},{"location":"using-git/creating-a-remote-repository/#linking-your-local-and-remote-repositories","title":"Linking your local and remote repositories","text":"

      Once you have created the remote repository, you need to link it to your local repository. This will allow you to \"push\" commits from your local repository to the remote repository, and to \"pull\" commits from the remote repository to your local repository.

      Note

      When you create a new repository on services such as GitHub, they will give you instructions on how to link this new repository to your local repository. We also provide an example, below.

      A repository can be linked to more than one remote repository, so we need to choose a name to identify this remote repository.

      Info

      The name \"origin\" is commonly used to identify the main remote repository.

      In this example, we link our local repository to the remote repository for this book (https://github.com/robmoss/git-is-my-lab-book) with the following command:

      git remote add origin git@github.com:robmoss/git-is-my-lab-book.git\n

      Note

      Notice that the URL is similar to, but not identical to, the URL you use to view the repository in your web browser.

      "},{"location":"using-git/creating-a-repository/","title":"Creating a repository","text":"

      You can create repositories by running git init. This will create a .git directory that will contain all of the repository information.

      There are two common ways to use git init:

      1. Create an empty repository in the current directory, by running:

        git init\n
      2. Create an empty repository in a specific directory, by running:

        git init path/to/repository\n

        Info

        Git will create the repository directory if it does not exist.

      "},{"location":"using-git/exercise-create-a-local-repository/","title":"Exercise: create a local repository","text":"

      In this exercise you will create a local repository, and use this repository to create multiple commits, switch between branches, and inspect the repository history.

      1. Create a new, empty repository in a directory called git-exercise.

      2. Create a README.md file and write a brief description for this repository. Record the contents of README.md in a new commit, and write a commit message.

      3. Write a script that generates a small data set, and saves the data to a CSV file. For example, this script could sample values from a probability distribution with fixed shape parameters. Explain how to use this script in README.md. Record your changes in a new commit.

      4. Write a script that plots these data, and saves the figure in a suitable file format. Explain how to use this script in README.md. Record your changes in a new commit.

      5. Add a tag milestone-1 to the commit you created in the previous step.

      6. Create a new branch called feature/new-data. Check out this branch and modify the data-generation script so that it produces new data and/or more data. Record your changes in one or more new commits.

      7. Create a new branch called feature/summarise from the tag you created in step #5. Check out this branch and modify the plotting script so that it also prints some summary statistics of the data. Record your changes in one or more new commits.

      8. In your main or master branch, and add a license. Record your changes in a new commit.

      9. In your main or master branch, merge the two feature branches created in steps #6 and #7, and add a new tag milestone-2.

      "},{"location":"using-git/exercise-create-a-local-repository/#self-evaluation","title":"Self evaluation","text":"

      Now that you have started a repository, created commits in multiple branches, and merged these branches, here are some questions for you to consider:

      • Have you committed the generated data file and/or the plot figure?

      • If you haven't committed either or both of these files, have you instructed git to ignore them?

      • Did you add a meaningful description to each milestone tag?

      • How many commits modified your data-generation script?

      • How many commits modified your plotting script?

      • What changes, if any, were made to README.md since it was first created?

      Tip

      To answer some of these questions, you may need to run git commands.

      "},{"location":"using-git/exercise-resolve-a-merge-conflict/","title":"Exercise: resolve a merge conflict","text":"

      We have created a public repository that you can use to try resolving a merge conflict yourself. This repository includes some example data and a script that performs some basic data analysis.

      First, obtain a local copy (a \"clone\") of this repository by running:

      git clone https://github.com/robmoss/gimlb-simple-merge-example.git\ncd gimlb-simple-merge-example\n
      "},{"location":"using-git/exercise-resolve-a-merge-conflict/#the-repository-history","title":"The repository history","text":"

      You can inspect the repository history by running git log. Some key details to notice are:

      1. The first commit created the following files:
      2. README.md
      3. LICENSE
      4. analysis/initial_exploration.R
      5. input_data/data.csv

      6. The second commit created the following file:

      7. outputs/summary.csv

      This commit has been given the tag first_milestone.

      1. From this first_milestone tag, two branches were created:

      2. The feature/second-data-set branch adds a second data set and updates the analysis script to inspect both data sets.

      3. The feature/calculate-rate-of-change branch changes which summary statistics are calculated for the original data set.

      4. The example-solution branch merges both feature branches and resolves any merge conflicts. This branch has been given the tag second_milestone.

      "},{"location":"using-git/exercise-resolve-a-merge-conflict/#your-task","title":"Your task","text":"

      You will start with the master branch, which contains the commits up to the first_milestone tag, and then merge the two feature branches into this branch, resolving any merge conflicts that arise. You can then compare your results to the example-solution branch.

      1. Obtain a local copy of this repository, by running:

        git clone https://github.com/robmoss/gimlb-simple-merge-example.git\ncd gimlb-simple-merge-example\n
      2. Create local copies of the two feature branches and the example solution, by running:

        git checkout feature/second-data-set\ngit checkout feature/calculate-rate-of-change\ngit checkout example-solution\n
      3. Return to the master branch, by running:

        git checkout master\n
      4. Merge the feature/second-data-set branch into master, by running:

        git merge feature/second-data-set\n
      5. Merge the feature/calculate-rate-of-change branch into master, by running:

        git merge feature/calculate-rate-of-change\n

      This will result in a merge conflict, and now you need to decide how to resolve each conflict! Once you have resolved the conflicts, create a commit that records all of your changes (see the previous chapter for an example).

      Tip

      You may find it helpful to inspect the commits in each of the feature branches to understand how they have changed the files in which the conflicts have occurred.

      "},{"location":"using-git/exercise-resolve-a-merge-conflict/#self-evaluation","title":"Self evaluation","text":"

      Once you have created a commit that resolves these conflicts, see how similar or different the contents of your commit are to the corresponding commit in the example-solution branch (which has been tagged second_milestone). You can inspect this commit by running:

      git show example-solution\n

      You can compare this commit to your solution by running:

      git diff example-solution\n

      How does your resolution compare to this commit?

      Note

      You may have resolved the conflicts differently to the example-solution branch, and that's perfectly fine as long as they have the same effect.

      "},{"location":"using-git/exercise-resolve-a-merge-conflict/#example-solution","title":"Example solution","text":"

      Here we present a recorded terminal session in which we clone this repository and resolve the merge conflict.

      Tip

      You can use the video timeline (below) to jump to specific moments in this exercise. Remember that you can pause the recording at any point to select and copy any of the text.

      Resolving a merge conflict

      Video timeline:

      1. Start: a quick look around
      2. Create local copies of branches
      3. Inspect the feature/second-data-set branch
      4. Inspect the feature/calculate-rate-of-change branch
      5. Merge the feature/second-data-set branch
      6. Merge the feature/calculate-rate-of-change branch
      7. Resolve the merge conflicts
      8. Compare to the example solution
      "},{"location":"using-git/exercise-use-a-remote-repository/","title":"Exercise: use a remote repository","text":"

      In this exercise, you will use a remote repository to synchronise and merge changes between multiple local repositories, starting from the local git-exercise repository that you created in the previous exercise.

      "},{"location":"using-git/exercise-use-a-remote-repository/#create-a-remote-repository","title":"Create a remote repository","text":"
      1. Create a new remote repository on a platform such as GitHub. You can make this a private repository, because you won't need to share it with anyone.

      2. Link your local git-exercise repository to this remote repository, and push all branches and tags to this remote repository.

      "},{"location":"using-git/exercise-use-a-remote-repository/#clone-the-remote-repository","title":"Clone the remote repository","text":"
      1. Make a local copy of this remote repository called git-exercise-2.

      2. Check out the main or master branch. The files should be identical to the milestone-2 tag in your original git-exercise repository.

      "},{"location":"using-git/exercise-use-a-remote-repository/#work-on-the-new-local-repository","title":"Work on the new local repository","text":"
      1. Create a new branch called feature/report. Check out this branch and create a new file called report.md. Edit this file so that it contains:

      2. A brief description of the generated data set;

      3. A table of the summary statistics printed by the plotting scripting (see the Markdown Guide); and
      4. The figure produced by the plotting script (see the Markdown Guide).

      Record your changes in a new commit.

      1. Push this new branch to the remote repository.
      "},{"location":"using-git/exercise-use-a-remote-repository/#merge-the-report-into-the-original-repository","title":"Merge the report into the original repository","text":"
      1. In your original git-exercise repository, checkout the feature/report branch from the remote repository and verify that it now contains the file report.md.

      2. Merge this branch into your main or master branch, and add a new tag milestone-3-report.

      3. Push the updated main or master branch to the remote repository.

      "},{"location":"using-git/exercise-use-a-remote-repository/#update-the-new-local-repository","title":"Update the new local repository","text":"
      1. In your git-exercise-2 repository, checkout the main or master branch and pull changes from the remote repository. It should now contain the file report.md.

      Info

      Congratulations! You have used a remote repository to synchronise and merge changes between two local repositories. You can use this workflow to collaborate with colleagues.

      "},{"location":"using-git/exercise-use-a-remote-repository/#self-evaluation","title":"Self evaluation","text":"

      Now that you have used commits and branches to share work between multiple repositories, here are some questions for you to consider:

      • Do you feel comfortable in deciding which changes to record in a single commit?

      • Do you feel that your commit messages help describe the changes that you have made in this repository?

      • Do you feel comfortable in using multiple branches to work on separate ideas in parallel?

      • Do you have any current projects that you might want to work on using local and remote git repositories?

      "},{"location":"using-git/first-time-git-setup/","title":"First-time Git setup","text":"

      Once you've installed Git, you should define some important settings before you starting using Git.

      Info

      We assume that you will want to set the git configuration for all repositories owned by your user. Therefore, we use the --global flag. Configuration files can be set for a single repository or the whole computer by replacing --global with --local or --system respectively.

      1. Define your user name and email address. These details are included in every commit that you create.

        git config --global user.name \"My Name\"\ngit config --global user.email \"my.name@some-university.edu.au\"\n
        2. Define the text editor that Git should use for tasks such as writing commit messages:

        git config --global core.editor editor-name\n

        NOTE: on Windows you need to specify the full path to the editor:

        git config --global core.editor \"C:/Program Files/My Editor/editor.exe\"\n

        Tip

        Please see Choosing your Git editor for details.

      2. By default, Git will create a branch called master when you create a new repository. You can set a different name for this initial branch:

        git config --global init.defaultBranch main\n
      3. Ensure that repository histories always record when branches were merged:

        git config --global merge.ff no\n

        This prevents Git from \"fast-forwarding\" when the destination branch contains no new commits. For example, it ensures that when you merge the green branch into the blue branch (as shown below) it records that commits D, E, and F came from the green branch.

      4. Adjust how Git shows merge conflicts:

        git config --global merge.conflictstyle diff3\n

        This will be useful when we look at how to use branches and how to resolve merge conflicts.

      Info

      If you use Windows, there are tools that can improve your Git experience in PowerShell.

      There are also tools for integrating Git into many common text editors. See Git in other environments, Appendix A of the Pro Git book.

      "},{"location":"using-git/graphical-git-clients/","title":"Graphical Git clients","text":"

      While Git is a command-line program, there are other ways to work with Git repositories:

      • There are many graphical clients that you can download and use;

      • Many editors include Git support (e.g., Atom, RStudio, Visual Studio Code); and

      • Online platforms such as GitHub, GitLab, and Bitbucket also provide a graphical interface for common Git actions.

      In this book we will primarily show how to use Git from the command-line, but all of the concepts and terminology should also apply to all of the tools described above. If you don't have Git already installed on your computer, see these instructions for installing Git.

      "},{"location":"using-git/how-to-create-and-use-tags/","title":"How to create and use tags","text":"

      Tags allow you to bookmark important points in your commit history.

      You can use tags to identify milestones such as:

      • Adding specific features to your model or data analysis (e.g., feature-age-dependent-mixing);
      • Completing objectives in your research plan (e.g., objective-1, objective-2);
      • Completed manuscript drafts (e.g., draft-1, draft-2); and
      • Manuscript submission and revisions (e.g., submitted, revision-1).
      "},{"location":"using-git/how-to-create-and-use-tags/#tagging-the-current-commit","title":"Tagging the current commit","text":"

      You can add a tag (in this example, \"my-tag\") to the current commit by running:

      git tag -a my-tag\n

      This will open your chosen editor and ask you to write a description for this tag.

      "},{"location":"using-git/how-to-create-and-use-tags/#pushing-tags-to-a-remote-repository","title":"Pushing tags to a remote repository","text":"

      By default, git push doesn't push tags to remote repositories. Instead, you have to explicitly push tags. You can push a tag (in this example, called \"my-tag\") to a remote repository (in this example, called \"origin\") by running:

      git push origin my-tag\n

      You can push all of your tags to a remote repository (in this example, called \"origin\") by running:

      git push origin --tags\n
      "},{"location":"using-git/how-to-create-and-use-tags/#tagging-a-past-commit","title":"Tagging a past commit","text":"

      To add a tag to a previous commit, you can identify the commit by its hash. For example, you can inspect your commit history by running:

      git log --oneline --no-decorate\n

      If your commit history looks like:

      003cf6b Show how to ignore certain files\n339eb5a Show how to prepare and record commits\n6a7fb8b Show how to clone remote repositories\n...\n
      where the current commit is 003cf6b (\"Show how to ignore certain files\"), you can tag the previous commit (\"Show how to prepare and record commits\") by running:

      git tag -a my-tag 339eb5a\n
      "},{"location":"using-git/how-to-create-and-use-tags/#listing-tags","title":"Listing tags","text":"

      You can list all tags by running:

      git tag\n

      You can also list only tags that match a specific pattern (in this example, all tags beginning with \"my\") by running:

      git tag --list 'my*'\n
      "},{"location":"using-git/how-to-create-and-use-tags/#deleting-tags","title":"Deleting tags","text":"

      You can delete a tag by running:

      git tag --delete my-tag\n
      "},{"location":"using-git/how-to-create-and-use-tags/#creating-a-branch-from-a-tag","title":"Creating a branch from a tag","text":"

      You can check out a tag and begin working on a new branch by running:

      git checkout -b my-branch my-tag\n
      "},{"location":"using-git/how-to-ignore-certain-files/","title":"How to ignore certain files","text":"

      Your repository may contain files that you don't want to include in your commit history. For example, you may not want to include files of the following types:

      • Sensitive data files for which access must be strictly controlled.
      • Temporary files that do not contain useful information, such as:
      • .aux files, which are generated when compiling LaTeX documents; and
      • .pyc files, which are generated when running Python code.
      • Files that can be automatically generated from your commit history, such as:
      • .pdf versions of LaTeX documents; and
      • documentation generated from your code files.

      You can instruct Git to ignore certain files by creating a .gitignore file. This is a plain text file, where each line defines a pattern that identifies files and directories which should be ignored. You can also add comments, which must start with a #, to explain the purpose of these patterns.

      Tip

      If your editor will not accept .gitignore as a file name, you can create a .gitignore file in your repository by running:

      touch .gitignore\n

      For example, the following .gitignore file would make Git ignore all .aux and .pyc files, and the file my-paper.pdf:

      # Ignore all .aux files generated by LaTeX.\n*.aux\n# Ignore all byte-code files generated by Python.\n*.pyc\n# Ignore the PDF version of my paper.\nmy-paper.pdf\n

      If you have sensitive data files, one option is to store them all in a dedicated directory and add this directory to your .gitignore file:

      # Ignore all data files in the \"sensitive-data\" directory.\nsensitive-data\n

      Tip

      You can force Git to add an ignored file to a commit by running:

      git add --force my-paper.pdf\n

      But it would generally be better to update your .gitignore file so that it stops ignoring these files.

      "},{"location":"using-git/how-to-resolve-merge-conflicts/","title":"How to resolve merge conflicts?","text":"

      A merge conflict can occur when we try to merge one branch into another, if the two branches introduce any conflicting changes.

      For example, consider trying to merge two branches that make the following changes to the same line of the file test.txt:

      1. On the branch my-new-branch:

         First line\n-Second line\n+My new second line\n Third line\n
      2. On the main branch:

         First line\n-Second line\n+A different second line\n Third line\n

      When we attempt to merge my-new-branch into the main branch, git merge my-new-branch will tell us:

      Auto-merging test.txt\nCONFLICT (content): Merge conflict in test.txt\nAutomatic merge failed; fix conflicts and then commit the result.\n

      The test.txt file will now include the conflicting changes, which we can inspect with git diff:

      diff --cc test.txt\nindex 18712c4,bc576a6..0000000\n--- a/test.txt\n+++ b/test.txt\n@@@ -1,3 -1,3 +1,7 @@@\n  First line\n++<<<<<<< ours\n +A different second line\n++=======\n+ My new second line\n++>>>>>>> theirs\n  Third line\n

      Note that this two-day diff shows:

      1. \"our\" changes: from the commits on the branch that we are merging into; and
      2. \"their\" changes: from the commits on the branch that we are merging from.

      Each conflict is surrounded by <<<<<<< and >>>>>>> markers, and the conflicting changes are separated by a ======= marker.

      If we instruct Git to use a three-way diff (see first-time Git setup), the conflict will be reported slightly differently:

      diff --cc test.txt\nindex 18712c4,bc576a6..0000000\n--- a/test.txt\n+++ b/test.txt\n@@@ -1,3 -1,3 +1,7 @@@\n  First line\n++<<<<<<< ours\n +A different second line\n++||||||| base\n++Second line\n++=======\n+ My new second line\n++>>>>>>> theirs\n  Third line\n

      In addition to showing \"our\" changes and \"their changes\", this three-way diff also shows the original lines, between the ||||||| and ======= markers. This extra information can help you decide how to best resolve the conflict.

      "},{"location":"using-git/how-to-resolve-merge-conflicts/#resolving-the-conflicts","title":"Resolving the conflicts","text":"

      We can edit test.txt to reconcile these changes, and the commit our fix. For example, we might decide that test.txt should have the following contents:

      First line\nThe corrected second line\nThird line\n

      We can then commit these changes to resolve the merge conflict:

      git add test.txt\ngit commit -m \"Resolved the merge conflict\"\n
      "},{"location":"using-git/how-to-resolve-merge-conflicts/#cancelling-the-merge","title":"Cancelling the merge","text":"

      Alternatively, you may decide you don't want to merge these two branches, in which case you cancel the merge by running:

      git merge --abort\n
      "},{"location":"using-git/how-to-structure-a-repository/","title":"How to structure a repository","text":"

      While there is no single \"best\" way to structure a repository, there are some guidelines that you can follow. The key aims are to ensure that your files are logically organised, and that others can easily navigate the repository.

      "},{"location":"using-git/how-to-structure-a-repository/#divide-your-repository-into-multiple-directories","title":"Divide your repository into multiple directories","text":"

      It is generally a good idea to have separate directories for different types of files. For example, your repository might contain any of these different file types, and you should at least consider storing each of them in a separate directory:

      • Input data files (which you may have received from a collaborator);
      • Cleaned and/or processed input files (e.g., if you aggregate the input data before using it);
      • Data analysis code;
      • Simulation/model code;
      • Output data files;
      • Plotting scripts that extract results from the output data files;
      • Output figures produced by the plotting scripts; and
      • Manuscript text and bibliography files.
      "},{"location":"using-git/how-to-structure-a-repository/#use-descriptive-names-for-directories-and-files","title":"Use descriptive names for directories and files","text":"

      Choosing file names that indicate what each file/directory contains can help other people, such as your collaborators, navigate your repository. They can also help you when you return to a project after several weeks or months.

      Tip

      Have you ever asked yourself \"where is the file that contains X\"?

      Use descriptive file names, and the answer might be right in front of you!

      "},{"location":"using-git/how-to-structure-a-repository/#include-a-readme-file","title":"Include a README file","text":"

      You can write this in Markdown (README.md), in plain text (README or README.txt), or in any other suitable format. For example, Python projects often use reStructuredText and have a README.rst file.

      This file should begin with a brief description of why the repository was created and what it contains.

      Importantly, this file should also mention:

      • How the files and directories are arranged. Help your collaborators understand where they need to look in order to find something.

      • How to run important pieces of code (e.g., to generate output data files or figures).

      • The software packages and/or libraries that are required run any of the code in this repository.

      • The license (if any) under which the repository contents are being made available.

      "},{"location":"using-git/how-to-use-branches/","title":"How to use branches?","text":"

      Recall that branches allow you to work on different ideas or tasks in parallel, within a single repository. In this chapter, we will show you how create and use branches. In the Collaborating section, we will show you how branches can allow multiple people to work together on code and papers, and how you can use branches for peer code review.

      Info

      Branches, like tags, are identified by name. Common naming conventions include:

      • feature/some-new-thing for adding something new (a new data analysis, a new model feature, etc); and
      • bugfix/some-problem for fixing something that isn't working as intended (e.g., perhaps there's a mistake in a data analysis script).

      You can choose your own conventions, but make sure that you choose meaningful names.

      Do not use names like branch1, branch2, etc.

      "},{"location":"using-git/how-to-use-branches/#creating-a-new-branch","title":"Creating a new branch","text":"

      You can create a new branch (in this example, called \"my-new-branch\") that starts from the current commit by running:

      git checkout -b my-new-branch\n

      You can also create a new branch that starts from a specific commit, tag, or branch in your repository:

      git checkout -b my-new-branch 95eaae5          # From an existing commit\ngit checkout -b my-new-branch my-tag-name      # From an existing tag\ngit checkout -b my-new-branch my-other-branch  # From an existing branch\n

      You can then create a corresponding upstream branch in your remote repository (in this example, called \"origin\") by running:

      git push -u origin\n
      "},{"location":"using-git/how-to-use-branches/#working-on-a-remote-branch","title":"Working on a remote branch","text":"

      If there is a branch in your remote repository that you want to work on, you can make a local copy by running:

      git checkout remote-branch-name\n

      This will create a local branch with the same name (in this example, \"remote-branch-name\").

      "},{"location":"using-git/how-to-use-branches/#listing-branches","title":"Listing branches","text":"

      You can list all of the branches in your repository by running:

      git branch\n

      This will also highlight the current branch.

      "},{"location":"using-git/how-to-use-branches/#switching-between-branches","title":"Switching between branches","text":"

      You can switch from your current branch to another branch (in this example, called \"other-branch\") by running:

      git checkout other-branch\n

      Info

      Git will not let you switch branches if you have any uncommitted changes.

      One way to avoid this issue is to record the current changes as a new commit, and explain in the commit message that this is a snapshot of work in progress.

      A second option is to discard the uncommitted changes to each file by running:

      git restore file1 file2 file3 ... fileN\n
      "},{"location":"using-git/how-to-use-branches/#pushing-and-pulling-commits","title":"Pushing and pulling commits","text":"

      Once you have created a branch, you can use git push to \"push\" your commits to the remote repository, and git pull to \"pull\" commits from the remote repository. See Pushing and pulling commits for details.

      "},{"location":"using-git/how-to-use-branches/#inspecting-branch-histories","title":"Inspecting branch histories","text":"

      You can use git log to inspect the commit history of any branch:

      git log branch-name\n

      Remember that there are many ways to control what git log will show you.

      Similarly, you can use git diff to compare the changes in any two branches:

      git diff first-branch second-branch\n

      Again, there are ways to control what git diff will show you.

      "},{"location":"using-git/how-to-use-branches/#merging-branches","title":"Merging branches","text":"

      You may reach a point where you want to incorporate the changes from one branch into another branch. This is referred to as \"merging\" one branch into another, and is illustrated in the What is a branch? chapter.

      For example, you might have completed a new feature for your model or data analysis, and now want to merge this back into your main branch.

      First, ensure that the current branch is the branch you want to merge the changes into (this is often your main or master branch). You can them merge the changes from another branch (in this example, called \"other-branch\") by running:

      git merge other-branch\n

      This can have two different results:

      1. The commits from other-branch were merged successfully into the current branch; or

      2. There were conflicting changes (referred to as a \"merge conflict\").

      In the next chapter we will show you how to resolve merge conflicts.

      "},{"location":"using-git/inspecting-your-history/","title":"Inspecting your history","text":"

      You can inspect your commit history at any time with the git log command. By default, this command will list every commit from the very first commit to the current commit, and for each commit it will show you:

      • The commit identifier (\"hash\"), which uniquely identifies this commit;
      • The person who created the commit (\"author\");
      • The date on which the commit was created; and
      • The commit message.

      There are many ways to adjust which commits and what details that git log will show.

      Tip

      Each commit has a unique identifier (\"hash\"). These hashes are quite long, but in general you only need to provide the first 5-7 digits to uniquely identify a specific commit.

      "},{"location":"using-git/inspecting-your-history/#listing-commits-over-a-specific-time-interval","title":"Listing commits over a specific time interval","text":"

      You can limit which commits git log will show by specifying a start time and/or an end time.

      Tip

      This can be extremely useful for generating progress reports and summarising your recent activity in team meetings.

      For example, you can view commits from the past week by running:

      git log --since='7 days'\ngit log --since='1 week'\n

      You can view commits made between 1 and 2 weeks ago by running:

      git log --since='2 weeks' --until='1 week'\n

      You can view commits made between specific dates by running:

      git log --since='2022/05/12' --until='2022/05/14'\n
      "},{"location":"using-git/inspecting-your-history/#listing-commits-that-modify-a-specific-file","title":"Listing commits that modify a specific file","text":"

      You can see which commits have made changes to a file by running:

      git log -- filename\n

      Info

      Note the -- argument that comes before the file name. This ensures that if the file name begins with a -, git log will not treat the file name as an option.

      "},{"location":"using-git/inspecting-your-history/#changing-how-commits-are-displayed","title":"Changing how commits are displayed","text":"

      You can make git log display only the first 7 digits of each commit hash, and the first line of each commit message, by running:

      git log --oneline\n

      This can be a useful way to get a quick overview of the recent history.

      "},{"location":"using-git/inspecting-your-history/#viewing-the-contents-of-a-single-commit","title":"Viewing the contents of a single commit","text":"

      You can identify a commit by its unique identifier (\"hash\") or by its tag name (if it has been tagged), and view the commit with git show:

      git show commit-hash\ngit show tag-name\n

      This will show the commit details and all of the changes that were recorded in this commit.

      Tip

      By default, git show will show you the most recent commit.

      "},{"location":"using-git/inspecting-your-history/#viewing-all-changes-over-a-specific-interval","title":"Viewing all changes over a specific interval","text":"

      You can view all of the changes that were made between two commits with the git diff command.

      Tip

      The git diff command shows the difference between two points in your commit history.

      Note that git diff does not support start and/or end times like git log does; you must use commit identifiers.

      For example, here is a subset of the commit history for [this book's repository](https://github.com/robmoss/git-is-my-lab-book):\n\n```text\n95eaae5 Note the need for a GitHub account and SSH key\n11085f0 Show how to create a branch from a tag\n9369482 Show how to create and use tags\n003cf6b Show how to ignore certain files\n339eb5a Show how to prepare and record commits\n6a7fb8b Show how to clone remote repositories\n6a49e10 Note that mdbook-admonish must be installed\na8e6114 Fixed the URL for the UoM GitLab instance\n5192704 Add a merge conflict exercise\n

      We can view all of the changes that were made after the bottom commit (5192704, \"Add a merge conflict exercise\") up to and including the top commit (95eaae5, \"Note the need for a GitHub account and SSH key\") by running:

      git diff 5192704..95eaae5\n

      In the above example, 8 files were changed, with a total of 310 new lines and 7 deleted lines. This is a lot of information! You can print a summary of these changes by running:

      git diff --stat 5192704..95eaae5\n

      This should show you the following details:

       README.md                                       |   2 +-\n src/SUMMARY.md                                  |   3 +\n src/prerequisites.md                            |   2 +\n src/using-git/cloning-an-existing-repository.md |  36 ++++++++++\n src/using-git/creating-a-commit.md              | 146 +++++++++++++++++++++++++++++++++++++--\n src/using-git/how-to-create-and-use-tags.md     |  89 ++++++++++++++++++++++++\n src/using-git/how-to-ignore-certain-files.md    |  37 ++++++++++\n src/version-control/what-is-a-repository.md     |   2 +-\n 8 files changed, 310 insertions(+), 7 deletions(-)\n

      This reveals that about half of the changes (146 new/deleted lines) were made to src/using-git/creating-a-commit.md.

      "},{"location":"using-git/inspecting-your-history/#viewing-changes-to-a-file-over-a-specific-interval","title":"Viewing changes to a file over a specific interval","text":"

      Similar to the git log command, you can limit the files that the git diff command will examine. For example, you can display only the changes made to README.md in the above example by running:

      git diff 5192704..95eaae5 -- README.md\n

      This should show you the following change:

      diff --git a/README.md b/README.md\nindex 7956b65..a34f907 100644\n--- a/README.md\n+++ b/README.md\n@@ -15,7 +15,7 @@ This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 Inter\n\n ## Building the book\n\n-You can build this book by installing [mdBook](https://rust-lang.github.io/mdBook/) and running the following command in this directory:\n+You can build this book by installing [mdBook](https://rust-lang.github.io/mdBook/) and [mdbook-admonish](https://github.com/tommilligan/mdbook-admonish/), and running the following command in this directory:\n\n ```shell\n mdbook build\n
      "},{"location":"using-git/pushing-and-pulling-commits/","title":"Pushing and pulling commits","text":"

      In general, we \"push\" commits from our local repository to a remote repository by running:

      git push <remote-repository>\n

      and \"pull\" commits from a remote repository into our local repository by running:

      git pull <remote-repository>\n

      where <remote-repository> is either a URL or the name of a remote repository.

      However, we generally want to push to, and pull from, the same remote repository every time. See the next section for an example of linking the main branch in your local repository with a corresponding \"upstream\" branch in your remote repository.

      "},{"location":"using-git/pushing-and-pulling-commits/#pushing-your-first-commit-to-a-remote-repository","title":"Pushing your first commit to a remote repository","text":"

      In order to push commits from your local repository to a remote repository, we need to create a branch in the remote repository that corresponds to the main branch of our local repository. This requires that you have created at least one commit in your local repository.

      Tip

      This is a good time to create a README.md file and write a brief description of what this repository will contain.

      Once you have at least one commit in your local repository, you can create a corresponding upstream branch in the remote repository with the following command:

      git push -u origin\n

      Note

      Recall that we identify remote repositories by name. In this example, the remote repository is call \"origin\". You can choose a different name when linking your local and remote repositories.

      Once you have defined the upstream branch, you can push commits by running:

      git push\n

      and pull commits by running:

      git pull\n

      without having to specify the remote repository.

      "},{"location":"using-git/pushing-and-pulling-commits/#forcing-updates-to-a-remote-repository","title":"Forcing updates to a remote repository","text":"

      By default, Git will refuse to push commits from a local branch to a remote branch if the remote branch contains any commits that are not in your local branch. This situation should not arise in general, and typically indicates that either someone else has pushed new commits to the remote branch (see the Collaborating section) or that you have altered the history of your local branch.

      If you are absolutely confident that your local history of commits should replace the contents of the remote branch, you can force this update by running:

      git push --force\n

      Tip

      Unless you are confident that you understand why this situation has occurred, it is probably a good idea to ask for advice before running the above command.

      "},{"location":"using-git/where-did-this-line-come-from/","title":"Where did this line come from?","text":"

      Consider the What should I commit? chapter. Imagine that we want to know when and why the following text was added:

      A helpful guideline is \"**commit early, commit often**\".\n

      If we can identify the relevant commit, we can then inspect the commit (using git show <commit>) to see all of the changes that it introduced. Ideally, the commit message will explain the reasons why this commit was made. This is one way in which your commit messages can act as a lab book.

      At the time of writing (commit 2a96324), the contents of the What should I commit? came from two commits:

      git log --oneline src/version-control/what-should-I-commit.md\n
      3dfff1f Add notes about committing early and often\n9be780b Briefly describe key version control concepts\n

      We can use the git blame command to identify the commit that last modified each line in this file:

      git blame -s src/version-control/what-should-I-commit.md\n
      9be780b8  1) # What should I commit?\n9be780b8  2)\n9be780b8  3) A commit should represent a **unit of work**.\n9be780b8  4)\n9be780b8  5) If you've made changes that represent multiple units of work (e.g., changing how input data are processed, and adding a new model parameter) these should be saved as separate commits.\n9be780b8  6)\n9be780b8  7) Try describing out loud the changes you have made, and if you find yourself saying something like \"I did X and Y and Z\", then the changes should probably divided into multiple commits.\n3dfff1fe  8)\n3dfff1fe  9) A helpful guideline is \"**commit early, commit often**\".\n3dfff1fe 10)\n3dfff1fe 11) ## Commit early\n3dfff1fe 12)\n3dfff1fe 13) - Don't delay creating a commit because \"it's not ready yet\".\n3dfff1fe 14)\n3dfff1fe 15) - A commit doesn't have to be \"perfect\".\n3dfff1fe 16)\n3dfff1fe 17) ## Commit often\n3dfff1fe 18)\n3dfff1fe 19) - Small, focused commits are **extremely helpful** when trying to identify the cause of an unintended change in your code's behaviour or output.\n3dfff1fe 20)\n3dfff1fe 21) - There is no such thing as too many commits.\n

      You can see that the first seven lines were last modified by commit 9be780b (Briefly describe key version control concepts), while the rest of the file was last modified by commit 3dfff1f (Add notes about committing early and often). So the text that we're interested in (line 9) was introduced by commit 3dfff1f.

      You can inspect this commit by running the following command:

      git show 3dfff1f\n
      Video demonstration

      "},{"location":"using-git/where-did-this-problem-come-from/","title":"Where did this problem come from?","text":"

      Let's find the commit that created the file src/version-control/what-is-a-repository.md. We could find this out using git log, but the point here is to illustrate how to use a script to find the commit that causes any arbitrary change to our repository.

      Once the commit has been found, you can inspect it (using git show <commit>) to see all of the changes this commit introduced and the commit message that (hopefully) explains the reasons why this commit was made. This is one way in which your commit messages can act as a lab book.

      1. Create a Python script called my_test.py with the following contents:

        #!/usr/bin/env python3\nfrom pathlib import Path\nimport sys\n\nexpected_file = Path('src') / 'version-control' / 'what-is-a-repository.md'\n\nif expected_file.exists():\n    # This file is the \"new\" thing that we want to detect.\n    sys.exit(1)\nelse:\n    # The file does not exist, this commit is \"old\".\n    sys.exit(0)\n

        For reference, here is an equivalent R script:

        #!/usr/bin/Rscript --vanilla\n\nexpected_file <- file.path('src', 'version-control', 'what-is-a-repository.md')\n\nif (file.exists(expected_file)) {\n    # This file is the \"new\" thing that we want to detect.\n    quit(status = 1)\n} else {\n    # The file does not exist, this commit is \"old\".\n    quit(status = 0)\n}\n
      2. Select the commit range over which to search. We know that the file exists in the commit 3dfff1f (Add notes about committing early and often), and it did not exist in the very first commit (5a19b02).

      3. Instruct Git to start searching with the following command:

        git bisect start 3dfff1f 5a19b02\n

        Note that we specify the newer commit first, and then the older commit.

        Git will inform you about the search progress, and which commit is currently being investigated.

        Bisecting: 7 revisions left to test after this (roughly 3 steps)\n[92f1375db21dd8a35ca141365a477b963dbbf6dc] Add CC-BY-SA license text and badge\n
      4. Instruct Git to use the script my_test.py to check each commit with the following command:

        git bisect run ./my_test.py\n

        It will continue to report the search progress and automatically identify the commit that we're looking for:

        running  './my_test.py'\nBisecting: 3 revisions left to test after this (roughly 2 steps)\n[9be780b8785d67ee191b2c0b113270059c9e0c3a] Briefly describe key version control concepts\nrunning  './my_test.py'\nBisecting: 1 revision left to test after this (roughly 1 step)\n[055906f28da146a2d012b7c1c0e4707503ed1b11] Display example commit message as plain text\nrunning  './my_test.py'\nBisecting: 0 revisions left to test after this (roughly 0 steps)\n[1251357ab5b41d511deb48cd5386cae37eec6751] Rename the \"What is a repository?\" source file\nrunning  './my_test.py'\n1251357ab5b41d511deb48cd5386cae37eec6751 is the first bad commit\ncommit 1251357ab5b41d511deb48cd5386cae37eec6751\nAuthor: Rob Moss <robm.dev@gmail.com>\nDate:   Sun Apr 17 21:41:43 2022 +1000\n\n    Rename the \"What is a repository?\" source file\n\n    The file name was missing the word \"a\" and did not match the title.\n\n src/SUMMARY.md                              |  2 +-\n src/version-control/what-is-a-repository.md | 18 ++++++++++++++++++\n src/version-control/what-is-repository.md   | 18 ------------------\n 3 files changed, 19 insertions(+), 19 deletions(-)\n create mode 100644 src/version-control/what-is-a-repository.md\n delete mode 100644 src/version-control/what-is-repository.md\n
      5. To quit the search and return to your current commit, run the following command:

        git bisect reset\n
      6. You can then inspect this commit by running the following command:

        git show 1251357\n
      "},{"location":"version-control/","title":"Version control concepts","text":"

      This section provides a high-level introduction to the concepts that you should understand in order to make effective use of version control.

      Info

      Version control can turn your files into a lab book that captures the broader context of your research activities and that you can easily search and reproduce.

      "},{"location":"version-control/exercise-using-version-control/","title":"Exercise: using version control","text":"

      In this section we have introduced version control, and outlined how it can be useful for academic research activities, including:

      • Capturing a detailed, annotated record of your research;
      • Inspecting changes made between any two moments in time;
      • Identifying when a specific change was made; and
      • Sharing your research with collaborators.

      Info

      We'd now like you think about how version control might be useful to you and your research.

      Have you experienced any issues or challenges in your career where version control would have been helpful? For example:

      • Have you ever looked at some of your older code and had difficulty understanding what it is doing, how it works, or why it was written?

      • Have you ever had difficulties identifying what code and/or data were used to generate a particular analysis or output?

      • Have you ever discovered a bug in your code and tried to identify when it was introduced, or what outputs it might have affected?

      • When collaborating on a research project, have you ever had challenges in making sure that everyone was working with the most recent files?

      How can you use version control in your current research project(s)?

      • Do you have an existing project or piece of code that could benefit from being stored in a repository?

      • Have you recently written any code that could be recorded as one or more commits?

      • If so, what would you write for the commit messages?

      • Have you written some exploratory code or analysis that could be stored in a separate branch?

      Having looked at the use of version control in the past and present, how would using version control benefit you?

      "},{"location":"version-control/how-do-I-write-a-commit-message/","title":"How do I write a commit message?","text":"

      Commit messages are shown as part of the repository history (e.g., when running git log). Each message consists of a short one-line description, followed by as much or as little text as required.

      You should treat these messages as entries in a log book. Explain what changes were made and why they were made. This can help collaborators understand what we have done, but more importantly is acts as a record for our future selves.

      Info

      Have you ever looked at code you wrote a long time ago and wondered what you were thinking?

      A history of detailed commit messages should allow you to answer this question!

      Remember that code is harder to read than it is to write (Joel Spolsky).

      For example, rather than writing:

      Added model

      You could write something like:

      Implemented the initial model

      This model includes all of the core features that we need to fit the data, but there several other features that we intend to add:

      - Parameter X is currently constant, but we may need to allow it to vary over time;

      - Parameter Y should probably be a hyperparameter; and

      - The population includes age-structured mixing, but we need to also include age-specific outcomes, even though there is very little data to suggest what the age effects might be.

      "},{"location":"version-control/what-is-a-branch/","title":"What is a branch?","text":"

      A branch allows you create a series of commits that are separate from the main history of your repository. They can be used for units of work that are too large to be a single commit.

      Info

      It is easy to switch between branches! You can work on multiple ideas or tasks in parallel.

      Consider a repository with three commits: commit A, followed by commit B, followed by commit C:

      At this point, you might consider two ways to implement a new model feature. One way to do this is to create a separate branch for each implementation:

      You can work on each branch, and switch between them, in the same local repository.

      If you decide that the first implementation (the green branch) is the best way to proceed, you can then merge this branch back into your main branch. This means that your main branch now contains six commits (A to F), and you can continue adding new commits to your main branch:

      "},{"location":"version-control/what-is-a-commit/","title":"What is a commit?","text":"

      A \"commit\" is a set of changes to one or more files in a repository. These changes can include:

      • Adding lines to a file;
      • Removing lines from a file;
      • Changing lines in a file;
      • Adding new files; and
      • Deleting existing files.

      Each commit also includes the date and time that it was created, the user that created it, and a commit message.

      "},{"location":"version-control/what-is-a-merge-conflict/","title":"What is a merge conflict?","text":"

      In What is a branch? we presented an example of successfully merging a branch into another. However, when we try to merge one branch into another, we may find that the two branches have conflicting changes. This is known as a merge conflict.

      Consider two branches that make conflicting changes to the same line of a file:

      1. Replace \"Second line\" with \"My new second line\":

         First line\n-Second line\n+My new second line\n Third line\n
      2. Replace \"Second line\" with \"A different second line\":

         First line\n-Second line\n+A different second line\n Third line\n

      There is no way to automatically reconcile these two branches, and we have to fix this conflict manually. This means that we need to decide what the true result should be, edit the file to resolve these conflicting changes, and commit our modifications.

      "},{"location":"version-control/what-is-a-repository/","title":"What is a repository?","text":"

      A repository records a set of files managed by a version control system, including the historical record of changes made to these files.

      You can create as many repositories as you want. Each repository should be a single \"thing\", such as a research project or a journal article, and should be located in a separate directory.

      You will generally have at least two copies of each repository:

      1. A local repository on your computer; and

      2. A remote repository on a service such as GitHub, or a University-provided platform (such as the University of Melbourne's GitLab instance).

      You make changes in your local repository and \"push\" them to the remote repository. You can share this remote repository with your collaborators and supervisors, and they will be able to see all of the changes that you have pushed.

      You can also allow collaborators to push their own changes to the remote repository, and then \"pull\" them into your local repository. This is one way in which you can use version control to work collaboratively on a project.

      "},{"location":"version-control/what-is-a-tag/","title":"What is a tag?","text":"

      A tag is a short, unique name that identifies a specific commit. You can use tags as bookmarks for interesting or important commits. Common uses of tags include:

      • Identifying manuscript revisions: draft-1, submitted-version, revision-1, etc.

      • Identifying software package versions: v1.0, v1.1, v2.0, etc.

      "},{"location":"version-control/what-is-version-control/","title":"What is version control?","text":"

      Version control is a way of systematically recording changes to files (such as computer code and data files). This allows you to restore any previous version of a file. More importantly, this history of changes can be queried, and each set of changes can include additional information, such as who made the changes and an explanation of why the changes were made.

      A core component of making great decisions is understanding the rationale behind previous decisions. If we don't understand how we got \"here\", we run the risk of making things much worse.

      \u2014 Chesterton's Fence

      For academic research activities that involve data analysis or simulation modelling, some key uses of version control are:

      • You can use it as a log book, and capture a detailed and permanent record of every step of your research. This is extremely helpful for people \u2014 including you! \u2014 who want to understand and make use of your work.

      • You can collaborate with others in a systematic way, ensuring that everyone has access to the most recent files and data, and review everyone's contributions.

      • You can inspect the changes made over a period of interest (e.g., \"What have I done in the last week?\").

      • You can identify when a specific change occurred, and what other changes were made at the same time (e.g., \"What changes did I make that affected this output figure?\").

      In this book we will focus on the Git version control system, which is used by popular online platforms such as GitHub, GitLab, and Bitbucket.

      "},{"location":"version-control/what-should-I-commit/","title":"What should I commit?","text":"

      A commit should represent a unit of work.

      If you've made changes that represent multiple units of work (e.g., changing how input data are processed, and adding a new model parameter) these should be saved as separate commits.

      Try describing out loud the changes you have made, and if you find yourself saying something like \"I did X and Y and Z\", then the changes should probably divided into multiple commits.

      A helpful guideline is \"commit early, commit often\".

      "},{"location":"version-control/what-should-I-commit/#commit-early","title":"Commit early","text":"
      • Don't delay creating a commit because \"it's not ready yet\".

      • A commit doesn't have to be \"perfect\".

      "},{"location":"version-control/what-should-I-commit/#commit-often","title":"Commit often","text":"
      • Small, focused commits are extremely helpful when trying to identify the cause of an unintended change in your code's behaviour or output.

      • There is no such thing as too many commits.

      "}]} \ No newline at end of file +{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"Introduction","text":"

      These materials aim to support early- and mid-career researchers (EMCRs) in the SPECTRUM and SPARK networks to develop their computing skills, and to make effective use of available tools1 and infrastructure2.

      "},{"location":"#motivation","title":"Motivation","text":"

      Question

      Why dedicate time and effort to learning these skills? There are many reasons!

      The overall aim of these materials is help you conduct code-driven research more efficiently and with greater confidence.

      Hopefully some of the following reasons resonate with you.

      • Fearlessly modify your code, knowing that your past work is never lost, by using version control.

      • Verify that your code behaves as expected, and get notified when it doesn't, by writing tests.

      • Ensure that your results won't change when running on a different computer by \"baking in\" reproducibility.

      • Improve your coding skills, and those of your colleagues, by working collaboratively and making use of peer code review.

      • Run your code quickly, and without relying on your own laptop or computer, by using high-performance computing.

      Foundations of effective research

      A piece of code is often useful beyond a single project or study.

      By applying the above skills in your research, you will be able to easily reproduce past results, extend your code to address new questions and problems, and allow others to build on your code in their own research.

      The benefits of good practices can continue to pay off long after the project is finished.

      "},{"location":"#structure","title":"Structure","text":"

      These materials are divided into the following sections:

      1. Understanding version control, which provides you with a complete and annotated history of your work, and with powerful ways to search and examine this history.

      2. Learning to use Git, the most widely used version control system, which is the foundation of popular code-hosting services such as GitHub, GitLab, and Bitbucket.

      3. Using Git to collaborate with colleagues in a precisely controlled and manageable way.

      4. Ensuring that your research is reproducible by others.

      5. Using testing frameworks to verify that your code behaves as intended, and to automatically detect when you introduce a bug or mistake into your code.

      6. Running your code on various computing platforms that allow you to obtain results efficiently and without relying on your own laptop/computer.

      7. Case studies where EMCRs showcase how their research activities are enabled and/or supported by these tools.

      8. We are organising a Community of Practice that will act as a living curriculum, and will use this section to record the findings and outputs of our community activities.

      "},{"location":"#how-to-contribute","title":"How to contribute","text":"

      If you have any comments, feedback, or suggestions, please see the How to contribute page.

      "},{"location":"#license","title":"License","text":"

      This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

      1. Such as version control and testing frameworks.\u00a0\u21a9

      2. Such as the ARDC Nectar Research Cloud and Spartan.\u00a0\u21a9

      "},{"location":"contributors/","title":"Contributors","text":"

      Here is a list of the contributors who have helped develop these materials:

      • Rob Moss (robmoss)
      • Eamon Conway (EamonConway)
      • James Ong (jomonman537)
      • Trish Campbell (TrishC)
      • Isobel Abell (iabell)
      "},{"location":"how-to-contribute/","title":"How to contribute","text":""},{"location":"how-to-contribute/#add-a-case-study","title":"Add a case study","text":"

      If you've made use of Git in your research activities, please let us know! We're looking for case studies that highlight how EMCRs are using Git. See the instructions for suggesting new content (below).

      "},{"location":"how-to-contribute/#provide-comments-and-feedback","title":"Provide comments and feedback","text":"

      The easiest way to provide comments and feedback is to create an issue. Note that this requires a GitHub account. If you do not have a GitHub account, you can email any of the authors. Please include \"Git is my lab book\" in the subject line.

      "},{"location":"how-to-contribute/#suggest-modifications-and-new-content","title":"Suggest modifications and new content","text":"

      This book is written in Markdown and is published using Material for MkDocs. See the Material for MkDocs Reference for an overview of the supported features.

      You can suggest modifications and new content by:

      • Forking the book repository;

      • Adding, deleting, and/or modifying book chapters in the docs/ directory;

      • Recording your changes in one or more git commits; and

      • Creating a pull request, so that we can review your suggestions.

      Info

      You can also edit any page by clicking the \"Edit this page\" button () in the top-right corner. This will start the process described above by forking the book repository.

      Tip

      When editing Markdown content, please start each sentence on a separate line. Also check that your text editor removes trailing whitespace.

      This ensures that each commit will contain only the modified sentences, and makes it easier to inspect the repository history.

      Tip

      When you add a new page, you must also add the page to the nav block in mkdocs.yml.

      "},{"location":"how-to-contribute/#adding-tabbed-code-blocks","title":"Adding tabbed code blocks","text":"

      You can display content in multiple tabs by using ===. For example:

      === \"Python\"\n\n    ```py\n    print(\"Hello world\")\n    ```\n\n=== \"R\"\n\n    ```R\n    cat(\"Hello world\\n\")\n    ```\n\n=== \"C++\"\n\n    ```cpp\n    #include <iostream>\n\n    int main() {\n        std::cout << \"Hello World\";\n        return 0;\n    }\n    ```\n\n=== \"Shell\"\n\n    ```sh\n    echo \"Hello world\"\n    ```\n\n=== \"Rust\"\n\n    ```rust\n    fn main() {\n        println!(\"Hello World\");\n    }\n    ```\n

      produces:

      PythonRC++ShellRust
      print(\"Hello world\")\n
      cat(\"Hello world\\n\")\n
      #include <iostream>\n\nint main() {\n    std::cout << \"Hello World\";\n    return 0;\n}\n
      echo \"Hello world\"\n
      fn main() {\n    println!(\"Hello World\");\n}\n
      "},{"location":"how-to-contribute/#adding-terminal-session-recordings","title":"Adding terminal session recordings","text":"

      You can use asciinema to record a terminal session, and display this recorded session with a small amount of HTML and JavaScript. For example, the following code is used to display the where-did-this-line-come-from.cast recording in a tab called \"Video demonstration\", as shown in Where did this line come from? chapter:

      === \"Video demonstration\"\n\n    <div id=\"demo\" data-cast-file=\"../where-did-this-line-come-from.cast\"></div>\n

      You can also add links that jump to specific times in the video. Each link must have:

      • A data-video attribute that identifies the video (in the example above, this is \"demo\");
      • A data-seek-to attribute that identifies the time (in seconds) to jump to; and
      • A href attribute that is set to \"javascript:;\" (so that the link doesn't scroll the page).

      For example, the following code is used to display the video recording on the Choosing your Git Editor:

      === \"Git editor example\"\n\n    <div id=\"demo\" data-cast-file=\"../git-editor-example.cast\"></div>\n\n    Video timeline:\n\n    1. <a data-video=\"demo\" data-seek-to=\"4\" href=\"javascript:;\">Overview</a>\n    2. <a data-video=\"demo\" data-seek-to=\"17\" href=\"javascript:;\">Show how to use nano</a>\n    3. <a data-video=\"demo\" data-seek-to=\"71\" href=\"javascript:;\">Show how to use vim</a>\n
      "},{"location":"learning-objectives/","title":"Learning objectives","text":"

      This page defines the learning objectives for individual sections. These are skills that the reader should be able to demonstrate after reading through the relevant section, and completing any exercises in that section.

      "},{"location":"learning-objectives/#version-control-concepts","title":"Version control concepts","text":"

      After completing this section, you should be able to identify how to apply version control concepts to your existing work. This includes being able to:

      • Identify projects and tasks for which version control would be suitable;

      • Categorise recent work activities into one or more commits;

      • Write commit messages that describe what changes you made and why you made them; and

      • Identify pieces of work that could be carried out in separate branches of a repository.

      "},{"location":"learning-objectives/#effective-use-of-git","title":"Effective use of git","text":"

      After completing this section, you should be able to:

      • Create a local repository;

      • Create commits in your local repository;

      • Search your commit history to identify commits that made a specific change;

      • Create a remote repository;

      • Push commits from your local repository to a remote repository;

      • Pull commits from a remote repository to your local repository;

      • Use tags to identify important milestones;

      • Work in a separate branch and then merge your changes into your main branch; and

      • Resolve merge conflicts.

      "},{"location":"learning-objectives/#collaborating","title":"Collaborating","text":"

      After completing this section, you should be able to:

      • Share a repository with one or more collaborators;

      • Create a pull request;

      • Use a pull request to review a collaborator's work;

      • Use a pull request to merge a collaborator's work into your main branch; and

      • Conduct peer code review in a respectful manner.

      "},{"location":"prerequisites/","title":"Prerequisites","text":"

      These materials assume that the reader has a basic knowledge of the Bash command-line shell and using SSH to connect to remote computers. You should be comfortable with using the command-line to perform the following tasks:

      • Navigate your files and directories;
      • Create, copy, move, and delete files and directories; and
      • Work remotely using SSH.

      Please refer to the following materials for further details:

      • The Unix Shell: an introduction to using Bash.
      • Extra Unix Shell Material: additional shell lessons, including SSH.

      Info

      If you use Windows, you may want to use PowerShell instead of Bash, in which case please refer to this Introduction to the Windows Command Line with Powershell.

      Some chapters also assume that the reader has an account on GitHub and has added an SSH key to their account.

      "},{"location":"references/","title":"References","text":""},{"location":"references/#education-and-commentary-articles","title":"Education and commentary articles","text":"
      • A Beginner's Guide to Conducting Reproducible Research describes key requirements for producing reproducible research outputs.

      • Point of View: How open science helps researchers succeed presents evidence that open research practices bring significant benefits to researchers.

      • A Quick Guide to Organizing Computational Biology Projects suggests an approach for structuring a computational research repository.

      "},{"location":"references/#using-git-and-other-software-tools","title":"Using Git and other software tools","text":"
      • NDP Software have created an interactive Git cheat-sheet that shows how git commands interact with the local and upstream repositories, and provides brief documentation for many common examples.

      • The Pro Git book is available online. It starts with an overview of Git basics and then covers every aspect of Git in great detail.

      • The Software Carpentry Foundation publishes many lessons, including Version Control with Git.

      • A Quick Introduction to Version Control with Git and GitHub provides a short guide to using Git and GitHub. It presents an example of analysing publicly available ChIP-seq data with Python. The repository for the article is also publicly available.

      "},{"location":"references/#performing-peer-code-review","title":"Performing peer code review","text":"
      • The Art of Giving and Receiving Code Reviews (Gracefully)

      • Code Review in the Lab

      • Scientific Code Review

      • The 5 Golden Rules of Code Review

      "},{"location":"references/#computational-research-practices","title":"Computational research practices","text":"
      • A simple kit to use computational notebooks for more openness, reproducibility, and productivity in research provides some good recommendations for organising a project repository and setting up a reproducible workflow using computational notebooks.

      • Why code rusts collects together some of reasons the behaviour of code changes over time.

      "},{"location":"references/#high-performance-computing-platforms","title":"High-performance computing platforms","text":"
      • How to access the ARDC Nectar Research Cloud

      • Melbourne Research Cloud

      • High Performance Computing at University of Melbourne

      "},{"location":"references/#how-to-acknowledge-and-cite-research-software","title":"How to acknowledge and cite research software","text":"
      • The ARDC Guide to making software citable explains how to cite your code and assign it a DOI.

      • Recognizing the value of software: a software citation guide provides further examples and guidance for ensuring your work receives proper attribution and credit.

      "},{"location":"references/#software-licensing","title":"Software licensing","text":"
      • Choose an open source license provides advice for selecting an appropriate license that meets your needs.

      • A Quick Guide to Software Licensing for the Scientist-Programmer explains the various types of available licenses and provides advice for selecting a suitable license.

      "},{"location":"case-studies/","title":"Case studies","text":"

      This section contains interesting and useful examples of incorporating Git into a research activity, as contributed by EMCRs in our network.

      "},{"location":"case-studies/campbell-pen-and-paper-version-control/","title":"Pen and paper - a less user-friendly form of version control than Git","text":"

      Author: Trish Campbell (patricia.campbell@unimelb.edu.au)

      Project: Pertussis modelling

      "},{"location":"case-studies/campbell-pen-and-paper-version-control/#the-problem","title":"The problem","text":"

      In this project, I developed a compartmental model of pertussis to determine appropriate vaccination strategies. While plotting some single model simulations, I noticed anomalies in the modelled output for two experiments. The first experiment had an order of magnitude more people in the infectious compartments than in the second experiment, even though there seemed to be far fewer infections occurring. This scenario did not fit with the parameter values that were being used. In the differential equation file for my model, in addition to extracting the state of the model (i.e. the population in each compartment at each time step), for ease of analysis I also extracted the cumulative number of infections up to that time step. The calculation for this extraction of cumulative incidence was incorrect.

      "},{"location":"case-studies/campbell-pen-and-paper-version-control/#the-solution","title":"The solution","text":"

      The error occurred because susceptible people in my model were not all equally susceptible, and I failed to account for this when I calculated the cumulative number of infections at each time step. I identified that this was the problem by running some targeted test parameter sets and observing the changes in model output. The next step was to find out how long this bug had existed in the code and which analyses had been affected. While I was using version control, I tended to make large infrequent commits. I did, however, keep extensive hand-written notes in lab books, which played the role of a detailed history of commits. Searching through my historical lab books, I identified that I had introduced this bug into the code two years earlier. I was able to determine which parts of my results would have been affected by the bug and made the decision that all experiments needed to be re-run.

      "},{"location":"case-studies/campbell-pen-and-paper-version-control/#how-version-control-helped","title":"How version control helped","text":"

      Using a pen and paper form of version control enabled me to pinpoint the introduction of the error and identify the affected analyses, but it was a tedious process. While keeping an immaculate record of changes that I had made was invaluable, imagine how much simpler and faster the process would have been if I had been a regular user of an electronic version control system such as Git!

      "},{"location":"case-studies/moss-incorrect-data-pre-print/","title":"Incorrect data in a pre-print figure","text":"

      Author: Rob Moss (rgmoss@unimelb.edu.au)

      Project: COVID-19 scenario modelling (public repository)

      "},{"location":"case-studies/moss-incorrect-data-pre-print/#the-problem","title":"The problem","text":"

      Our colleague James Trauer notified us that they suspected there was an error in Figure 2 of our COVID-19 scenario modelling pre-print article. This figure showed model predictions of the daily ICU admission demand in an unmitigated COVID-19 pandemic, and in a COVID-19 pandemic with case targeted public health measures. I inspected the script responsible for plotting this figure, and confirmed that I had mistakenly plotted the combined demand for ward and ICU beds, instead of the demand for ICU beds alone.

      "},{"location":"case-studies/moss-incorrect-data-pre-print/#the-solution","title":"The solution","text":"

      This mistake was simple to correct, but the obvious concern was whether any other outputs related to ICU bed demand were affected.

      We conducted a detailed review of all data analysis scripts and outputs, and confirmed that this error only affected this single manuscript figure. It had no bearing on the impact of the interventions in each model scenario. Importantly, it did not affect any of the simulation outputs, summary tables, and/or figures that were included in our reports to government.

      The corrected figure can be seen in the published article.

      "},{"location":"case-studies/moss-incorrect-data-pre-print/#how-version-control-helped","title":"How version control helped","text":"

      Because we used version control to record the development history of the model and all of the simulation analyses, we were able to easily inspect the repository state at the time of each prior analysis. This greatly simplified the review process, and ensured that we were inspecting the code exactly as it was when we produced each analysis.

      "},{"location":"case-studies/moss-pypfilt-earlier-states/","title":"Fixing a bug in pypfilt","text":"

      Author: Rob Moss (rgmoss@unimelb.edu.au)

      Project: pypfilt, a bootstrap particle filter for Python

      Date: 27 October 2021

      "},{"location":"case-studies/moss-pypfilt-earlier-states/#overview","title":"Overview","text":"

      I introduced a bug when I modified a function in my pypfilt package, and only detected the bug after I had created several more commits.

      To resolve this bug, I had to:

      1. Notice the bug;

      2. Identify the cause of the bug;

      3. Write a test case to check whether the bug is present; and

      4. Fix the bug.

      "},{"location":"case-studies/moss-pypfilt-earlier-states/#notice-the-bug","title":"Notice the bug","text":"

      I noticed that a regression test1 was failing: re-running a set of model simulations was no longer generating the same output. The results had changed, but none of my recent commits should have had this effect.

      I should have noticed this when I created the commit that introduced this bug, but:

      • I had not pushed the most recent commits to the upstream repository, where all of the test cases are run automatically every time a new commit is pushed; and

      • I had not run the test cases on my laptop after making each of the recent commits, because this takes a few minutes and I was lazy.

      "},{"location":"case-studies/moss-pypfilt-earlier-states/#identify-the-cause-of-the-bug","title":"Identify the cause of the bug","text":"

      I knew that the bug had been introduced quite recently, and I knew that it affected a specific function: earlier_states(). Running git blame src/pypfilt/state.py indicated that the recent commit 408b5f1 was a likely culprit, because it changed many lines in this function.

      In particular, I suspected the bug was occurring in the following loop, which steps backwards in time and handles the case where model simulations are reordered:

      # Start with the parent indices for the current particles, which allow us\n# to look back one time-step.\nparent_ixs = np.copy(hist['prev_ix'][ix])\n\n# Continue looking back one time-step, and only update the parent indices\n# at time-step T if the particles were resampled at time-step T+1.\nfor i in range(1, steps):\n    step_ix = ix - i\n    if hist['resampled'][step_ix + 1, 0]:\n        parent_ixs = hist['prev_ix'][step_ix, parent_ixs]\n

      In stepping through this code, I identified that the following line was incorrect:

          if hist['resampled'][step_ix + 1, 0]:\n

      and that changing step_ix + 1 to step_ix should fix the bug.

      Note: I could have used git bisect to identify the commit that introduced this bug, but running all of the test cases for each commit is relatively time-consuming; since I knew that the bug had been introduced quite recently, I chose to use git blame.

      "},{"location":"case-studies/moss-pypfilt-earlier-states/#write-a-test-case","title":"Write a test case","text":"

      I wrote a test case test_earlier_state() that called this earlier_states() function a number of times, and checked that each set of model simulations were returned in the correct order.

      This test case checks that:

      1. If the model simulations were not reordered, the original ordering is always returned;

      2. If the model simulations were reordered at some time t_0, the original ordering is returned for times t < t_0; and

      3. If the model simulations were reordered at some time t_0, the new ordering is returned for times t >= t_0.

      This test case failed when I reran the testing pipeline, which indicated that it identified the bug.

      "},{"location":"case-studies/moss-pypfilt-earlier-states/#fix-the-bug","title":"Fix the bug","text":"

      With the test case now written, I was able to verify that that changing step_ix + 1 to step_ix did fix the bug.

      I added the test case and the bug fix in commit 9dcf621.

      In the commit message I indicated:

      1. Where the bug was located: the earlier_states() function;

      2. When the bug was introduced: commit 408b5f1; and

      3. Why the bug was not detected when I created commit 408b5f1.

      1. A regression test checks that a commit hasn't changed an existing behaviour or functionality.\u00a0\u21a9

      "},{"location":"collaborating/","title":"Collaborating","text":"

      This section demonstrates how to use Git for collaborative research, enabling multiple people to work on the same code or paper in parallel. This includes deciding how to structure your repository, how to use branches for each collaborator, and how to use tags to track your progress.

      Info

      We also show how these skills support peer code review, so that you can share knowledge with, and learn from, your colleagues as part of your regular activity.

      "},{"location":"collaborating/an-example-pull-request/","title":"An example pull request","text":"

      The initial draft of each chapter in this section were proposed in a pull request.

      When this pull request was created, the branch added four new commits:

      85594bf Add some guidelines for collaboration workflows\n678499b Discuss coding style guides\n2b9ff70 Discuss merge/pull requests and peer code review\n6cc6f54 Discuss repository structure and licenses\n

      and the author (Rob Moss) asked the reviewer (Eamon Conway) to address several details in particular.

      Eamon made several suggestions in their initial response, including:

      • Moving the How to structure a repository and Choosing a license chapters to the Effective use of git section;

      • Starting this section with the Collaborating on code chapter; and

      • Agreeing that we should use this pull request as an example in this book.

      In response, Rob pushed two commits that addressed the first two points above:

      e1d1dd9 Move collaboration guidelines to the start\n3f78ef8 Move the repository structure and license chapters\n

      and then wrote this chapter to show how we used a pull request to draft this book section.

      "},{"location":"collaborating/coding-style-guides/","title":"Coding style guides","text":"

      A style guide defines rules and guidelines for how to structure and format your code. This can make code easier to write, because you don't need to worry about how to format your code. It can also make code easier to read, because consistent styling allows you to focus on the content.

      There are two types of tools that can help you use a style guide:

      • A formatter formats your code to make it consistent with a specific style; and

      • A linter checks whether your code is consistent with a specific style.

      "},{"location":"collaborating/coding-style-guides/#recommended-style-guides","title":"Recommended style guides","text":"

      Because programming languages can be very different from each other, style guides are usually defined for a single programming language.

      Here we list some of the most widely-used style guides for several common programming languages:

      • For R there is a tidyverse style guide.
      • You can apply this style to your code with styler.
      • You can check that your code conforms to this style with lintr.

      • For Python there is Black, which defines a coding style and applies this style to your code.

      • For C++ there is a Google C++ style guide.

      "},{"location":"collaborating/collaborating-on-a-paper/","title":"Collaborating on a paper","text":"

      Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write papers collaboratively as a team. This approach is particularly useful if you are writing a paper in LaTeX.

      Here are some general guidelines that you may find useful:

      • Divide the paper into separate LaTeX files for each section.

      • Use tags to identify milestones such as draft versions and revisions.

      • Consider creating a separate branch for each collaborator.

      • Merge these branches when completing a major draft or revision.

      • Use latexdiff to show tracked changes between the current version and a previous commit/tag:

      latexdiff-git --flatten -r tag-name paper.tex\n
      • Collaborators who will provide feedback, rather than contributing directly to the writing process, can do this by:

      • Annotating PDF versions of the paper; or

      • Providing comments in response to a merge/pull request.
      "},{"location":"collaborating/collaborating-on-code/","title":"Collaborating on code","text":"

      Once you are comfortable with creating commits, working in branches, and merging branches, you can use these skills to write code collaboratively as a team.

      The precise workflow will depend on the nature of your research and on the collaborators in your team, but there are some general guidelines that you may find helpful:

      • Agree on a style guide.

      • Work on separate features in separate branches.

      • Use peer code review before merging changes from these branches.

      • Consider using continuous integration to:

      • Run test cases and detect bugs as early as possible; and

      • Verify that code meets your chosen style guide.
      "},{"location":"collaborating/continuous-integration/","title":"Continuous integration","text":"

      Continuous Integration (CI) is an automated process where code changes are merged in a central repository in order to run automated tests and other processes. This can provide rapid feedback while you develop your code and collaborate with others, as long as commits are regularly pushed to the central repository.

      Info

      This book is an example of Continuous Integration: every time a commit is pushed to the central repository, the online book is automatically updated.

      Because the central repository is hosted on GitHub, we use GitHub Actions. Note that this is a GitHub-specific CI system. You can view the update action for this book here.

      We also use CI to publish each pull request, so that contributions can be previewed during the review process. We added this feature in this pull request.

      "},{"location":"collaborating/merge-pull-requests/","title":"Merge/Pull requests","text":"

      Recall that incorporating the changes from one branch into another branch is referred to as a \"merge\". You can merge one branch into another branch by taking the following steps:

      1. Checking out the branch you want to merge the changes into:

        git checkout -b my-branch\n
      2. Merging the changes from the other branch:

        git merge other-branch\n

      Tip

      It's a good idea to review these changes before you merge them.

      If possible, it's even better to have someone else review the changes.

      You can use git diff to view differences between branches. However, platforms such as GitHub and GitLab offer an easier approach: \"pull requests\" (also called \"merge requests\").

      "},{"location":"collaborating/merge-pull-requests/#creating-a-pull-request-on-github","title":"Creating a pull request on GitHub","text":"

      The steps required to create a pull request differ depending on which platform you are using. Here, we will describe how to create a pull request on GitHub. For further details, see the GitHub documentation.

      • Open the main page of your GitHub repository.

      • In the \"Branch\" menu, select the branch that contains the changes you want to merge.

      • Open the \"Contribute\" menu. This should be located on the right-hand side, above the list of files.

      • Click the \"Open pull request\" button.

      • In the \"base\" menu, select the branch you want to merge the changes into.

      • Enter a descriptive title for the pull request.

      • In the message editor, write a summary of the changes in this branch, and identify specific questions or objectives that you want the reviewer to address.

      • Select potential reviewers by clicking on the \"Reviewers\" link in the right-hand sidebar.

      • Click the \"Create pull request\" button.

      Once the pull request has been created, the reviewer(s) can review your changes and discuss their feedback and suggestions with you.

      "},{"location":"collaborating/merge-pull-requests/#merging-a-pull-request-on-github","title":"Merging a pull request on GitHub","text":"

      When the pull request has been reviewed to your satisfaction, you can merge these changes by clicking the \"Merge pull request\" button.

      Info

      If the pull request has merge conflicts (e.g., if the branch you're merging into contains new commits), you will need to resolve these conflicts.

      For further details about merging pull requests on GitHub, see the GitHub documentation.

      "},{"location":"collaborating/peer-code-review/","title":"Peer code review","text":"

      Once you're comfortable in using merge/pull requests to review changes in a branch, you can use this approach for peer code review.

      Info

      Remember that code review is a discussion and critique of a person's work. The code author will naturally feel that they own the code, and the reviewer needs to respect this.

      For further advice and suggestions on how to conduct peer code review, please see the Performing peer code review references.

      Tip

      Mention people who have reviewed your code in the acknowledgements section of the paper.

      "},{"location":"collaborating/peer-code-review/#define-the-goals-of-a-peer-review","title":"Define the goals of a peer review","text":"

      In creating a pull request and inviting someone to review your work, the pull request description should include the following details:

      • An overview of the work included in the pull request: what have you done, why have you done it?

      • You may also want to explain how this work fits into the broader context of your research project.

      • Identify specific questions or tasks that you would like the reviewer to address. For example, you might ask the reviewer to address one or more of the following questions:

      • Can the reviewer run your code and reproduce the outputs?

      • Is the code easy to understand?

      • If you have a style guide, is the code formatted appropriately?

      • Do the model equation or data analysis steps seem sensible?

      • If you have written documentation, is it easy to understand?

      • Can the reviewer suggest how to improve or rewrite a specific piece of code?

      Tip

      Make the reviewer's job easier by giving them small amounts of code to review.

      "},{"location":"collaborating/peer-code-review/#finding-a-reviewer","title":"Finding a reviewer","text":"

      On GitHub we have started a peer-review team. We encourage you to post on the discussion board, to find like-minded members to review your code.

      "},{"location":"collaborating/peer-code-review/#guidelines-for-reviewing-other-peoples-code","title":"Guidelines for reviewing other people's code","text":"

      Peer code review is an opportunity for the author and the reviewer to learn from each other and improve a piece of code.

      Tip

      The most important guideline for the reviewer is to be kind.

      Treat other people's code the way you would want them to treat your code.

      • Avoid saying \"you\". Instead, say \"we\" or make the code the subject of the sentence.

      • Don't say \"You don't have a test for this function\", but instead say \"We should test this function\".

      • Don't say \"Why did you write it this way?\", but instead say \"What are the advantages of this approach?\".

      • Ask questions rather than stating criticisms.

      • Don't say \"This code is unclear\", but instead say \"Can you help me understand how this code works?\".

      • Treat peer review as an opportunity to praise good work!

      • Don't be afraid to tell the author that a piece of code was very clear, easy to understand, or well written.

      • Tell the author if reading their code made you aware of a useful function or package.

      • Tell the author if reading their code gave you some ideas for your own code.

      "},{"location":"collaborating/peer-code-review/#complete-the-review","title":"Complete the review","text":"

      Once the peer code review is complete, and any corresponding updates to the code have been made, you can merge the branch.

      "},{"location":"collaborating/peer-code-review/#retain-a-record-of-the-review","title":"Retain a record of the review","text":"

      By using merge/pull requests to review code, the discussion between the author and the reviewer is recorded. This can be a useful reference for future code reviews.

      Tip

      Try to record all of the discussion in the pull request comments, even if the author and reviewer meet in person, so that you have a complete record of the review.

      "},{"location":"collaborating/sharing-a-branch/","title":"Sharing a branch","text":"

      You might want a collaborator to work on a specific branch of your repository, so that you can keep their changes separate from your own work. Remember that you can merge commits from their branch into your own branches at any time.

      Info

      You need to ensure that your collaborator has access to the remote repository.

      1. Create a new branch for the collaborator, and give it a descriptive name.

        git checkout -b collab/jamie\n

        In this example we created a branch called \"collab/jamie\", where \"collab\" is a prefix used to identify branches intended for collaborators, and the collaborator is called Jamie.

        Remember that you can choose your own naming conventions.

      2. Push this branch to your remote repository:

        git push -u origin\n
      3. Your collaborator can then make a local copy of this branch:

        git clone --single-branch --branch collab/jamie repository-url\n
      4. They can then create commits and push them to your remote repository with git push.

      "},{"location":"collaborating/sharing-a-repository/","title":"Sharing a repository","text":"

      The easiest way to share a repository with collaborators is to have a single remote repository that all collaborators can access. This repository could be located on a platform such as GitHub, GitLab, or Bitbucket, or on a platform provided by your University or Institute.

      Theses platforms allow you to create public repositories and private repositories.

      • Everybody can view the contents of a public repository.

      • You control who can view the contents of a private repository.

      • For both types of repository, you control who can make changes to the repository, such as creating commits and branches.

      Info

      You should decide whether a public repository or a private repository suits you best.

      "},{"location":"collaborating/sharing-a-repository/#giving-collaborators-access-to-your-remote-repository","title":"Giving collaborators access to your remote repository","text":"

      The steps required to do this differ depending on which platform you are using. Here, we will describe how to give collaborators access to a repository on GitHub. For further details, see the GitHub documentation.

      • Open the main page of your GitHub repository.

      • Click on the \"Settings\" tab in the top navigation bar.

      • Click on the \"Collaborators\" item in the left sidebar.

      • Click on the \"Add people\" button.

      • Search for collaborators by entering their GitHub user name, their full name, or their email address.

      • Click the \"Add to this repository\" button.

        This will send an invitation to the collaborator. If they accept this invitation, they will have access to your repository.

        "},{"location":"community/","title":"Community of Practice","text":"

        Info

        Communities of Practice are groups of people who share a concern or a passion for something they do and learn how to do it better as they interact regularly.

        The community acts as a living curriculum and involves learning on the part of everyone.

        The aim of a Community of Practice (CoP) is to come together as a community and engage in a process of collective learning in a shared domain. The three characteristics of a CoP are:

        1. Community: An environment for learning through interaction;

        2. Practice: Specific knowledge shared by community members; and

        3. Domain: A shared interest, problem, or concern.

        We meet as a community every 6 to 8 weeks, and capture observations in meeting summaries.

        "},{"location":"community/meetings/","title":"Meetings","text":"

        This section contains summaries of each Community of Practice meeting.

        • 17 April 2023: our initial meeting.

        • 13 June 2023: exploration of version control, reproducibility, and testing exercises.

        • 15 August 2023: changing our research and reproducibility practices.

        • 18 October 2023: sharing experiences about good ways to structure a project.

        "},{"location":"community/meetings/2023-04-17/","title":"17 April 2023","text":"

        This is our initial meeting. The goal is to welcome people to the community and outline how we envision running these Community of Practice meetings.

        "},{"location":"community/meetings/2023-04-17/#theme-reproducible-research","title":"Theme: Reproducible Research","text":"

        Outline the theme and scope for this community.

        This is open to all researchers who share an interest in reproducible research and/or related topics and practices; no prior knowledge is required.

        For example, consider these questions:

        • Can you reproduce your current results on a new computer?

        • Can someone else reproduce your current results?

        • Can someone else reproduce your current results without your help?

        • Can you reproduce your own results from, say, 2 years ago?

        • Can someone else reproduce your own results from, say, 2 years ago?

        • Can you fix a mistake and update your own results from, say, 2 years ago?

        Tip

        The biggest challenge can often be remembering what you did and how you did it.

        Making small changes to your practices can greatly improve reproducibilty!

        "},{"location":"community/meetings/2023-04-17/#how-will-these-meetings-run","title":"How will these meetings run?","text":"
        • Aim to hold these meetings on a (roughly) monthly basis.

        • Prior to each meeting, we will invite community members to propose a topic or discussion point to be the focus of the meeting. This may be an open question or challenge, an example of good research practices, a useful software tool, etc.

        • Schedule each meeting to best suit the availability of community members who are particularly interested in the chosen topic.

        • Each meeting should be hosted by one or more community members, with online participation available to those who cannot attend in person.

        • At the end of each meeting, we will ask attendees how useful/effective they found the meeting, so that we can better cater to the needs of the community. For example:

        • What do you think of the session?

        • What did we do well?
        • What could we do better in the next session?

        • We will summarise the key observations, findings, and outputs of each meeting in our online materials, and use them to improve and grow our training materials.

        "},{"location":"community/meetings/2023-04-17/#preferred-communication-channels","title":"Preferred communication channels?","text":"

        Info

        To function effectively as a community, we need to support asynchronous discussions in addition to scheduled meetings.

        One option is a dedicated mailing list. Other options were suggested:

        • A Slack workspace (Dave);

        • A Discord channel (TK);

        • A Teams channel (Gerry); and

        • A private GitHub repository, using the issue tracker (Alex).

        Using a GitHub issue tracker might also serve as a gentle introduction to GitHub?

        "},{"location":"community/meetings/2023-04-17/#supporting-activities-and-resources","title":"Supporting activities and resources?","text":"

        Are there other activities that we could organise to help support the community?

        • We have online training materials, Git is my lab book, which should be useful for people who are not familiar with version control.

        • We also have a SPECTRUM/SPARK peer review team, where people can make their code available for peer review.

        "},{"location":"community/meetings/2023-04-17/#topics-for-future-meetings","title":"Topics for future meetings?","text":"

        We asked each participant to suggest topics that they would like to see covered in future meetings and/or activities. A number of common themes emerged.

        "},{"location":"community/meetings/2023-04-17/#version-control-from-theory-to-practice","title":"Version control: from theory to practice","text":"

        A number of people mentioned now being sure how to get started, or starting with good intentions but ending up with a mess.

        • Dave: how can I transition from principle to practice?

        • Ollie: similar to David, I often start well but end up with a mess.

        • Ruarai: what other have found useful and applied in this space, what options are out there?

        • Michael: I'm a complete novice, git command lines are a foreign language to me! I'm looking for tips for someone who uses code a lot, experienced at coding but much less so on version control and the use of repositories. What are the first steps to incorporate it into my workflow?

        • Angus: I'm also relatively new to Git and have been using GitHub Desktop (a GUI for Windows and Mac). I'm not averse to command line stuff but I need to remember fewer arcane commands!

        • Samik: I use TortoiseGit \u2014 a Windows Shell Interface to Git.

        • Gray: I resonate with Michael, I do most of my research on my own and describe it in papers. It isn't particularly Git-friendly, I'm keen to learn.

        • Lauren: everything that everyone has said so far! I've found some good guidelines for how to write reproducible code, starting from the basics all the way to niche topics. Can we use this as a way to share materials that we've sound useful? The British Ecological Society have published guidelines. We could assemble good materials that start from basics.

        • David: The Society for Open, Reliable, and Transparent Ecology and Evolutionary Biology (SORTEE) also have good materials.

        • Gerry: I like the idea of reproducibility and I've done a terrible job of it in the past, my repository ends up with thousands of versions of different files. Can you help me solve it?

        • Josh: Along the same lines of what's been stated. How best to share knowledge of Git and best practices with others in a new research team? How to adjust to their methods of conducting reproducible research, version control, etc?

        • Punya: not much to add, would really like to know more about version control, I have a basic understanding, what's the standard way of using it, reproducibility and documentation.

        • Rachel: I strongly support this idea of code reproducibility. Best practice frameworks can be disseminated to modellers in modelling consortia, and they can be very helpful when auditing.

        • Ella: we're migrating models from Excel to R.

        • J'Belle: I work for a tiny, very remote health service at the Australian and Papua New Guinea border. We have 17 sources of clinical data, which presents massive challenges in reproducibility and quality assurance. I'm looking to tap into your expertise. How do we manage so many sources of clinical data?

        "},{"location":"community/meetings/2023-04-17/#working-with-less-technically-experienced-collaborators","title":"Working with less technically-experienced collaborators","text":"

        How can we make best use of existing tools and practices, while working with collaborators who have less technical expertise/experience?

        • Alex: if I start a project with collaborators who may be less technically literate, how can they contribute without messing up reproducibility? Options like Docker are a little too complicated. How can I motivate people, is there a simple solution?

        • Angus: in theory you may have reproducible code. But if you need to write a paper with less technical collaborators, running the code and generating reports can be hard. How do we collaborate on the writing side? RMarkdown and equivalents makes a lot of sense, but most colleagues will only look at Word documents. There are some workarounds, such as pandoc.

        "},{"location":"community/meetings/2023-04-17/#reproducibility-best-practices-and-limitations","title":"Reproducibility: best practices and limitations","text":"

        How far can/should we go in validating and demonstrating that our models and analyses are reproducible? How can we automate this? How is this impacted when we cannot share the input data, or when our models are extremely large and complex?

        • Cam: there are unique issues in the type of research we do. Working with code makes it easy in some ways, as opposed to experimental conditions in real-world experiments. Our capacity for reproducibility is great, but so then is our burden. We should be exploring the limitations! Some challenges in our area come down to implementation of stochastic models with lots of random processes. How can we do that well and make it part of what we practice? What are the limitations to reproducibility and how do we perceive the goals when we are working when the data cannot be shared?

        • Samik: similar to Cam, I'm interested in how people have produced reproducible research where the data cannot be shared. Perhaps we can provide default examples as test cases?

        • Michael: I second Cam's points, particularly about reproducibility with confidential data. That's an issue I've hit multiple times. We usually have a side folder with the real dataset, and pipe through condensed or aggregated versions of the data that we can release.

        • Jiahao: I'm interested in how to build a platform for using agent based models. I've looked at lots of other models, but how can we bring them together so that it is easier to add a new variable or extend a model?

        • Eamon: I'm a Git fanatic, and I want to understand the development of code that I work with. I get annoyed when people share a repository as a single commit. People who don't use tags in their Git repositories to identify the version of the code they used in, e.g., a paper! How do you start running the code? What file formats does it expect to process?

        • Dion: I'm interested in seeing what people are doing that look like good practice. Making sure that code and results are reproducible, in the sense that your code may be version controlled, but you've since made changes to code, parameters, input data, etc. How do you do a good job to shoe-horn that all into Git? Maybe use Git for development and simultaneously use a separate repository for production stuff? We need to be able to look back and identify from the commit the code, data, intermediate files used along the way.

        • Palang: I've looked at papers with supplementary code, but running the code myself produces very different results from what's in the paper.

        • May: most people have said what I wanted to say. I faced similar problems with running other people's code. It may not print any error message, but you get very different results from what's published in the paper. You don't know who or what is wrong!

        "},{"location":"community/meetings/2023-04-17/#testing-and-documentation","title":"Testing and documentation","text":"

        How can we develop confidence in our own code, and in other people's code?

        • TK: I want to learn/see different conventions for writing code documentation. I've never managed to get doxygen working to my satisfaction.

        • Angus: how do we design good tests? How to test, when to test, what to test for? Should we have coverage targets? Are there ways to automated testing?

        • Rahmat: I often find it very hard to learn how to use other people's code. The code needs to be easy to understand. Otherwise, I will just write the code myself! Sometimes when I run the code, I have difficulties in generating results, many errors come up and it's not clear why. Perhaps all of the necessary data have not been shared with the code? We need to include the data, but if the data cannot be provided, you need to provide similar data so that other can run the code. It also helps to use a language that others are familiar with.

        "},{"location":"community/meetings/2023-04-17/#code-reuse","title":"Code reuse","text":"
        • Pan: I am not sure about the term reproducibility in the context of coders. I know lab people really do reuse published protocols. But do coders actually reuse other people's code to do their work?

        • Gerry: People often make their code into packages which others reuse. This could be a good topic for future meetings.

        "},{"location":"community/meetings/2023-04-17/#using-chat-gpt-to-writecheck-code","title":"Using Chat GPT to write/check code","text":"
        • Pan: I recently joined a meeting where people have used Chat GPT to check their code. Does this group have any thoughts on how we might make good use of Chat GPT?

        • Cam: Chat GPT is not reproducible itself, so it seems questionable to use it to check reproducibility.

        • Alex: I don't entirely agree, it can be very useful for improving the implementation of a function. In terms of generating reliable code, it's wonderful. It's a nightmare for evaluating existing code.

        • Pan: people are using Chat GPT to generate initial templates.

        • Eamon: If you encounter code that has poor documentation, Chat GPT is surprisingly good at telling you how to use it.

        • Matt: I don't have anything to add to the above, I'm happy to be along for the ride.

        "},{"location":"community/meetings/2023-06-13/","title":"13 June 2023","text":"

        In this meeting we asked participants to share their experiences exploring the version control, reproducibility, and testing exercises in our example repository.

        This repository serves an introduction to testing models and ensuring that their outputs are reproducible. It contains a simple stochastic model that draws samples from a normal distribution, and some tests that check whether the model outputs are consistent with our expectations.

        "},{"location":"community/meetings/2023-06-13/#what-is-a-reproducible-environment","title":"What is a reproducible environment?","text":"

        The exercise description was deliberately very open, but it may have been too vague:

        Define a reproducible environment in which the model can run.

        We avoided listing possible details for people to consider, such as software and package versions. Perhaps a better approach would have been to ask:

        If this code was provided as supporting materials for a paper, what other information would you need in order to run it and be confident of obtaining the same results as the original authors?

        The purpose of a reproducible environment is to define all of these details, so that you never have to say to someone \"well, it runs fine on my machine\".

        "},{"location":"community/meetings/2023-06-13/#reproducibility-and-stochasticity","title":"Reproducibility and stochasticity","text":"

        Many participants observed that the model was not reproducible unless we used a random number generator (RNG) with a known seed, which would ensure that the model produces the same output each time we run it.

        But what if you're using a package or library that internally uses their own RNG and/or seed? This may not be something you can fix, but you should be able to detect it by running the model multiple times with the same seed, and checking whether you get identical result each time.

        Another important question was raised: do you, or should you, include the RNG seed in your published code? This is probably a good idea, and suggested solutions included setting the seed at the very start of your code (so that it's immediately visible) or including it as a required model parameter.

        "},{"location":"community/meetings/2023-06-13/#writing-test-cases","title":"Writing test cases","text":"

        Tip

        Write a test case every time you find a bug: ensure that the test case finds the bug, then fix the bug, then ensure that the test case passes.

        A test case is a piece of code that checks that something behaves as expected. This can be as simple as checking that a mathematical function returns an expected value, to running many model simulations and verifying that a summary statistic falls within an expected range.

        Rather than trying to write a single test that checks many different properties of a piece of code, it can be much simpler and quicker to write many separate tests that each check a single property. This can provide more detailed feedback when one or more test cases fail.

        Note

        This approach is similar to how we rely on multiple public health interventions to protect against disease outbreaks! Consider each test case as a slice of Swiss cheese \u2014 many imperfect tests can provide a high degree of confidence in our code.

        "},{"location":"community/meetings/2023-06-13/#writing-test-cases-for-conditions-that-may-fail","title":"Writing test cases for conditions that may fail","text":"

        If you are testing a stochastic model, you may find certain test cases are difficult to write.

        For example, consider a stochastic SIR model where you want to test that an intervention reduces the number of cases in an outbreak. You may, however, observe that in a small proportion of simulations the intervention has no effect (or it may even increase the number of cases).

        One approach is to run many pairs of simulations and only check that the intervention reduced the number of cases at least X% of the time. You need to decide how many simulations to run, and what is an appropriate value for X%, but that's okay! Remember the Swiss cheese analogy, mentioned above.

        "},{"location":"community/meetings/2023-06-13/#testing-frameworks","title":"Testing frameworks","text":"

        If you have more than 2 or 3 test cases, it's a good idea to use a testing framework to automatically find your test cases, run each test, record whether it passed or failed, and report the results. These frameworks are usually specific to a single programming language.

        Some commonly-used frameworks include:

        • Python: pytest
        • R: testthat
        • Matlab: included in the language
        • Julia: included in the language
        "},{"location":"community/meetings/2023-06-13/#github-actions","title":"GitHub Actions","text":"

        Multiple participants reported some difficulties in setting up GitHub actions and knowing how to adapt available templates to their needs. See the following examples:

        • Python starter workflow; and
        • GitHub Actions for R.

        We will aim to provide a GitHub action workflow for each model, and add comments to explain how to adapt these templates.

        Warning

        One downside of using GitHub Actions is the limited computation time of 2,000 minutes per month. This may not be suitable for large agent-based models and other long-running tasks.

        "},{"location":"community/meetings/2023-06-13/#pull-requests","title":"Pull requests","text":"

        At the time of writing, three participants have contributed pull requests:

        • TK added a default seed so that the model outputs are reproducible.

        • Micheal added a MATLAB version of the model and the test cases.

        • Cam added several features, such as recording metadata about the Python environment and testing that the model outputs are reproducible.

        Tip

        If you make your own copy (\"fork\") of the example repository, you can create as many commits as you want. GitHub will display a message that says:

        This branch is N commits ahead of rrcop:master.

        Click on the \"N commits ahead\" link to see a summary of your new commits. You can then click the big green button \"Create pull request\".

        This will not modify the example repository. Instead, it will create an overview of the changes between your code and the example repository. We can then review these changes, make suggestions, you can add new commits, etc, before deciding whether to add these changes to the example repository.

        "},{"location":"community/meetings/2023-08-15/","title":"15 August 2023","text":"

        Info

        See the Resources section for links to useful resources that were mentioned in this meeting.

        "},{"location":"community/meetings/2023-08-15/#changes-to-practices","title":"Changes to practices","text":"

        In this meeting we asked everyone what changes (if any) they have made to their research and reproducibility practices since our last meeting.

        A common theme was improving how we note and record our past actions. For example:

        • Eamon has begun recording the commit ID (\"hash\") of the code that was used to generate each set of outputs. This allows him to easily retrieve the exact version of the code that was used to generate any past result and, e.g., generate other outputs of interest.

        • Pan talked about how their group records raw separately from, but grouped with, the analysis code and processed data that were generated from these raw data. They also record every step of their model-fitting process, which may not always go as smoothly as expected.

        This ensures that stakeholders who want to use these models to run their own scenarios can reproduce the baseline scenarios without being modelling experts themselves.

        The model is available as an online app.

        • Rob has begun working on an existing malaria model, which was implemented in R as a series of scripts that shared many global variables. He wanted to restructure code to better understand it, so he used version control to record the simulation outputs and ensure that he didn't change the model's behaviour as he restructured the code. On several occasions he modified parts of the code and discovered that these changes unexpectedly affected the simulation outputs. This is a manual equivalent of using continuous integration.
        "},{"location":"community/meetings/2023-08-15/#how-do-you-structure-a-project","title":"How do you structure a project?","text":"

        Gizem asked the group \"How do you choose an appropriate project structure, especially if the project changes over time?\"

        Phrutsamon: the TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data.

        Rob: there may not be a single perfect solution that addresses everyone's needs. But look back at past projects, and try to imagine how the current project might change in the future. And if you're using version control, don't be afraid to experiment with different project structures \u2014 you can always revert back to an earlier commit.

        "},{"location":"community/meetings/2023-08-15/#reviewing-code-as-part-of-manuscript-peer-review","title":"Reviewing code as part of (manuscript) peer review","text":"

        Rob asked the group \"Has anyone reviewed supporting code when reviewing a manuscript?\"

        • Ruarai read through R code that was provided with a paper, but was unable to run all of it \u2014 some of the code produced errors.

        • Similarly, Rob has read R code provided with a paper that used hard-coded paths that did not exist (e.g., \"C:\\Users\\<Author Name>\\...\"), tried to run code in source files that did not exist, and read data from CSV files that did not exist.

        Info

        Pan mentioned a fantastic exercise for research students.

        Pick a modelling paper that is relevant to their research project, and ask the student to:

        1. read it;
        2. understand it; and
        3. reproduce the figures.

        This teaches the students that reproducibility is very important, and shows them what they need to do when they publish their own results.

        It's important to pick a relatively simple paper, so that this task isn't too complicated for the student. And if the paper is written by a colleague or collaborator, you can contact them to ask for extra details, etc.

        "},{"location":"community/meetings/2023-08-15/#using-shiny-to-make-models-availablereproducible","title":"Using Shiny to make models available/reproducible","text":"

        Pan asked the group \"What do you think about (the extra work involved in) turning R code into Shiny applications, to show that the model is reproducible, and do so in a way that lets others easily make use it?\"

        An objective of the COVID-19 International Modelling Consortium (CoMo) is to make models available and usable for non-modellers \u2014 turning models into something that anyone with minimal knowledge can explore.

        The model is available as a Shiny app, and is continually being updated and refined. It is currently at version 19! Pan's group is trying to ensure that existing users update to the most recent version, because it can be very challenging and time-consuming to create scenario templates for older model versions. Templates are a good way to help the user define their scenario-specific settings, but it's a nightmare when you change the model version \u2014 it's like working with a new model.

        • Eamon: this is similar to when software platforms make changes to their APIs. Can you make backwards-compatible changes, or automatically transform old templates to make them compatible with the latest model version? This kind of work is simple to fund when your software is a commercial product, but it's much harder to find funding for academic outputs.

        • Pan: It's a lot of extra work, without any money to support it. For this consortium we hired several programmers, some for the coding, some specifically for the Shiny app, it involved a lot of resources. That project has now ended, but we've learned a lot and have a good network of collaborators. We still have monthly meetings! This was a special case with COVID-19, because the context changed so quickly. It would be much less of a problem with other diseases, which we better understood.

        • Gizem: very much in favour of using Shiny to make models available, and recently made a Shiny app for their latest project (currently under review). Because the model is very complicated, we had to pre-calculate model results for specific parameter combinations, and only allow users to choose between these parameter combinations. One reviewer asked for a modified figure to show results for slightly different parameter values, and it was quite simple to address.

        Hadley Wickham has written a very good book about developing R Shiny applications. Gizem read a chapter of this book each morning, but found it necessary to practice in order to really understand how to use Shiny.

        Info

        Learning by doing (experiential learning) is a highly-effective way of convincing people to change their practices. It can be greatly enhanced by engaging as a community.

        "},{"location":"community/meetings/2023-08-15/#resources","title":"Resources","text":""},{"location":"community/meetings/2023-08-15/#teaching-reproducibility-and-responsible-workflows","title":"Teaching reproducibility and responsible workflows","text":"

        The Journal of Statistics and Data Science Education published a special issue: Teaching Reproducibility in November 2022. The accompanying editorial article highlights:

        Integrating reproducibility into our practice and our teaching can seem intimidating initially. One way forward is to start small. Make one small change to add an element of exposing students to reproducibility in one class, then make another the next semester. Our students can get much of the benefit of reproducible and responsible workflows even if we just make a few small changes in our teaching. These efforts will help them to make more trustworthy insights from data. If it leads, by way of some virtuous cycle, to us improving our own practice, then even better! Improving our teaching through providing curricular guidance about reproducible science will take time and effort that should pay off in the long term.

        This journal issue was followed by an invited paper session with the following presentations:

        • Collaborative writing workflows: building blocks towards reproducibility

        • Opinionated practices for teaching reproducibility: motivation, guided instruction, and practice

        • From teaching to practice: Insights from the Toronto Reproducibility Conferences

        • Teaching reproducibility and responsible workflow: an editor's perspective

        "},{"location":"community/meetings/2023-08-15/#project-templates","title":"Project templates","text":"
        • The TIER Protocol 4.0 provides a template for organising the contents and reproduction documentation for projects that involve working with statistical data:

        Documentation that meets the specifications of the TIER Protocol contains all the data, scripts, and supporting information necessary to enable you, your instructor, or an interested third party to reproduce all the computations necessary to generate the results you present in the report you write about your project.

        "},{"location":"community/meetings/2023-08-15/#using-shiny","title":"Using Shiny","text":"
        • Mastering Shiny: an online book that teaches how to create web applications with R and Shiny.

        • CoMo Consortium App: the COVID-19 International Modelling Consortium (CoMo) has developed web application for an age-structured, compartmental SEIRS model.

        "},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-r","title":"Continuous integration examples for R","text":"
        • Building reproducible analytical pipelines with R: this article shows how to use GitHub Actions to run R code when you push new commits to a GitHub repository.

        • GitHub Actions for the R language: this repository provides a variety of GitHub actions for R projects, such as installing specific versions of R and R packages.

        "},{"location":"community/meetings/2023-08-15/#continuous-integration-examples-for-python","title":"Continuous integration examples for Python","text":"
        • GitHub Actions for Python: the GitHub Actions documentation includes examples of building and testing Python projects.
        "},{"location":"community/meetings/2023-08-15/#other-continuous-integration-examples","title":"Other continuous integration examples","text":"

        See the GitHub actions for Git is my lab book, available here. For example, the build action performs the following actions:

        1. Check out the repository, using actions/checkout;

        2. Install mdBook and other required tools, using make.

        3. Build a HTML version of the book, using mdBook.

        "},{"location":"community/meetings/2023-10-18/","title":"18 October 2023","text":"

        In this meeting we asked participants to share their experiences about good (and bad) ways to structure a project.

        Info

        We are currently drafting Project structure and Writing code guidelines.

        See the pull request for further details. Please contribute suggestions!

        We had six in-person and eight online attendees. Everyone predominantly uses one or more of the following languages:

        • Matlab;
        • Python; and
        • R.
        "},{"location":"community/meetings/2023-10-18/#naming-files","title":"Naming files","text":"

        The tidyverse style guide includes recommendations for naming files. One interesting recommendation in this guide is:

        • If files should be run in a particular order, prefix each file name with a number. For example:

          00_download.R 01_clean.R 02_summarise.R ... 09_plot_figures.R 10_generate_tables.R

        "},{"location":"community/meetings/2023-10-18/#choosing-a-directory-structure","title":"Choosing a directory structure","text":"

        A common starting point is often one or more scripts in the root directory. But we can usually divide a project into several distinct steps or stages, and store the files necessary for each stage in a separate sub-directory.

        Tip

        Your project structure may change as the project develops. That's okay!

        You might, e.g., realise that some files should be moved to a new, or different, sub-directory.

        Packaging: Python and R allow you to bundle multiple code files into a \"package\". This makes it easier to use code that is split up into multiple files. It also makes it simpler to test and verify whether your code can be run on a different computer. To create a package, you need to provide some metadata, including a list of dependencies (packages or libraries that your code needs in order to run). When installing a Python or R package, it will automatically install the necessary dependencies too. You test this out on, e.g., a virtual machine to verify that you've correctly listed all of the necessary dependencies.

        Version control: the history may be extremely useful for you, but may contain things you don't want to make publicly available. One solution would be to know from the very start what files you will want to make available and what files you do not (e.g., sensitive data files), but this is not always possible. Another, more realistic, solution is to create a new repository, copy over all of the files that you want to make available, and record these files in a single commit. The public repository will not share the history of your project repository, and that's okay \u2014 the public repository's primary purpose is to serve as a snapshot, rather than a complete and unedited history.

        "},{"location":"community/meetings/2023-10-18/#locating-files","title":"Locating files","text":"

        A common concern how to locate files in different sub-directories (e.g., loading code, reading data files, writing output files) without relying on using absolute paths. For loading code, Python and Matlab allow the user to add directories to the search path (e.g., by modifying sys.path in Python, or calling addpath() in Matlab). But these are not ideal solutions.

        • As a general rule, prefer using relative paths instead of absolute paths.

        • Relative paths are defined relative to the current working directory. For example: sub-directory/file-name and ../other-directory.

        • Absolute paths are defined relative to the root drive or directory. For example: /Users/My Name/... and C:\\Users\\My Name\\....

        Absolute paths may not exist on other people's computers.

        • For R, the here package allows you to construct file paths relative to the top-level project directory. For example, if you have a data file in project/input-data/file-1.csv and a script file in project/analysis-code/read-input-data.R, you can locate the data file from within the script with the following code:
        library(here)\ndata_file <- here(\"input-data/file-1.csv\")\n

        Tip

        A general solution for any programming language is to break your code into functions, each of which accepts input and/or output file names as arguments (when required). This means that most of your code is entirely independent of your chosen project structure. You can then store/generate all of the file paths in a single file, or in each of your top-level scripts.

        "},{"location":"community/meetings/2023-10-18/#peer-review-get-feedback-on-project-structure","title":"Peer review: get feedback on project structure","text":"

        It can be helpful to get feedback from someone who isn't directly involved in the project. They may view the work from a fresh perspective, and be able to identify aspects that are confusing or unclear.

        When inviting someone to review your work, you should identify specific questions or tasks that you would like the reviewer to address.

        With respect to project structure, you may want to ask the reviewer to address questions such as:

        • Do the project directories suggest a clear structure or workflow?
        • Does each directory contain files that are clearly related to each other?
        • Do the names of each directory and each file seem reasonable?
        • Are there any files that you would consider renaming or moving?
        • Does the README.md file help you to navigate the project?

        You could also ask the reviewer to look at a specific script or code file, and ask questions such as:

        • Should this code be divided into smaller functions?
        • Should this code be divided into multiple files?

        Info

        For further ideas about useful peer review activities, and how to incorporate them into your workflow, see the following paper:

        Implementing code review in the scientific workflow: Insights from ecology and evolutionary biology, Ivimey-Cook et al., Journal of Evolutionary Biology 36(10):1347\u20131356, 2023.

        "},{"location":"community/meetings/2023-10-18/#styling-and-formatting","title":"Styling and formatting","text":"

        We also discussed opinions about how to name functions, variables, files, etc.

        For example, R allows you to use periods (.) in function and variable names, but the tidyverse style guide recommends only using lowercase letters, numbers, and underscores (_).

        If you review other people's code, and have other people review your code, you might be surprised by the different styles and conventions that people use. When reviewing code, these differences can be somewhat distracting.

        • Agreeing on, and adhering to, a common style guide can avoid these issues and allow the reviewer to dedicate their attention to actually reading and reasoning about the code.

        • There are tools to automatically format your code (\"formatters\") and to warn about potential issues, such as unused variables (\"linters\"). Here are some commonly-used formatters and linters for different languages:

        Language Style guide(s) Formatter Linter R tidyverse styler lintr Python PEP 8 / The Hitchhiker's Style Guide black ruff Julia style guide JuliaFormatter.jl Lint.jl"},{"location":"community/meetings/2023-10-18/#ai-tools-for-writing-and-reviewing-code","title":"AI tools for writing and reviewing code","text":"

        There are AI tools that you can use to write, format, and review code, but you will need to check whether the code is correct. For example, GitHub Copilot is a (commercial) tool that accepts natural-language descriptions and generates computer code.

        Tip

        Feel free to use AI tools as a way to get started, but don't simply copy-and-paste the code they give you without reviewing it.

        "},{"location":"high-performance-computing/","title":"Cloud and HPC platforms","text":"

        This section introduces computing platforms that allow you to generate outputs more quickly, and without relying on your own laptop or desktop computer. It also demonstrates how to use version control to ensure that the code running on these platforms is the same as the code on your laptop.

        "},{"location":"reproducibility/","title":"Reproducibility","text":"

        This section demonstrates how to use version control and software testing to ensure that your research results can be independently reproduced by others.

        Tip

        Reproducibility is just as much about simple work habits as the tools used to share data and code.

        \u2014 Jesse M. Alston and Jessica A. Rick

        "},{"location":"testing/","title":"Testing","text":"

        This section introduces the topic of software testing. Testing your code is an important part of any code-based research activity. Tests check whether your code behaves as intended, and can warn you if you introduce a bug or mistake into your code.

        Tip

        Tests can show the presence of bugs, but not their absence.

        \u2014 Edsger W. Dijkstra

        "},{"location":"using-git/","title":"Effective use of git","text":"

        This section shows how to use the git command-line program to record your work, to inspect your commit history, and to search this commit history to identify commits that make specific changes or have specific effects.

        Reminder

        Remember to commit early and commit often. Do not wait until your code is \"perfect\".

        "},{"location":"using-git/choosing-a-license/","title":"Choosing a license","text":"

        A license specifies the conditions under which others may use, modify, and/or distribute your work.

        Info

        Simply making a repository publicly accessible is not sufficient to allow others to make use of your work. Unless you include a license that specifies otherwise, nobody else can copy, distribute, or modify your work.

        There are many different types of licenses that you can use, and the number of options can seem overwhelming. But it is usually straightforward to narrow down your options.

        • If you're working on an existing project, the easiest option is to use that project's license.

        • If you're working with an existing community, they may have a preferred license.

        • If you want to choose an open source license, the Choose an open source license website provides advice for selecting a license that meets your needs.

        For further information about the various types of available licenses, and some advice for selecting a suitable license for academic software, see A Quick Guide to Software Licensing for the Scientist-Programmer.

        "},{"location":"using-git/choosing-your-git-editor/","title":"Choosing your Git editor","text":"

        In this video, we show how to use nano and vim for writing commit messages. See below for brief instructions on how to use these editors.

        Tip

        This editor is only used for writing commit messages. It is entirely separate from your choice of editor for any other task, such as writing code.

        Git editor example

        Video timeline:

        1. Overview
        2. Show how to use nano
        3. Show how to use vim

        Note

        You can pause the video to select and copy any of the text, such as the git config --global core.editor commands.

        "},{"location":"using-git/choosing-your-git-editor/#how-to-use-nano","title":"How to use nano","text":"

        Once you have written your commit message, press Ctrl + O and then Enter to save the commit message, then press Ctrl + X to quit the editor.

        To quit without saving press Ctrl + X. If you have made any changes, nano will ask if you want to save them. Press n to quit without saving these changes.

        "},{"location":"using-git/choosing-your-git-editor/#how-to-use-vim","title":"How to use vim","text":"

        You need to press i (switch to insert mode) before you can write your commit message. Once you have written your commit message, press Esc and then type :wq to save your changes and quit the editor.

        To quit without saving press Esc and then type :q!.

        "},{"location":"using-git/cloning-an-existing-repository/","title":"Cloning an existing repository","text":"

        If there is an existing repository that you want to work on, you can \"clone\" the repository and have a local copy. To do this, you need to know the remote repository's URL.

        Tip

        For GitHub repositories, there should be a green button labelled \"Code\". Click on this button, and it will provide you with the URL.

        You can then make a local copy of the repository by running:

        git clone URL\n

        For example, to make a local copy of this book, run the following command:

        git clone https://github.com/robmoss/git-is-my-lab-book.git\n

        This will create a local copy in the directory git-is-my-lab-book.

        Note

        If you have a GitHub account and have set up an SSH key, you can clone GitHub repositories using your SSH key. This will allow you to push commits to the remote repository (if you are permitted to do so) without having to enter your user name and password.

        You can obtain the SSH URL from GitHub by clicking on the green \"Code\" button, and selecting the \"SSH\" tab.

        For example, to make a local copy of this book using SSH, run the following command:

        git clone git@github.com:robmoss/git-is-my-lab-book.git\n
        "},{"location":"using-git/creating-a-commit/","title":"Creating a commit","text":"

        Creating a commit involves two steps:

        1. Identify the changes that should be included in the commit. These changes are then \"staged\" and ready to be included in the next commit.

        2. Create a new commit that records these staged changes. This should be accompanied by a useful commit message.

        We will now show how to perform these steps.

        Note

        At any time, you can see a summary of the changes in your repository, and which ones are staged to be committed, by running:

        git status\n

        This will show you:

        1. The files (if any) that contain changes that have been staged;
        2. The files (if any) that contain changes that have not been staged; and
        3. The files (if any) that are not recorded in the repository history.
        "},{"location":"using-git/creating-a-commit/#adding-a-new-file","title":"Adding a new file","text":"

        If you've created a new file, you can include this file in the next commit by running:

        git add filename\n
        "},{"location":"using-git/creating-a-commit/#adding-all-changes-in-an-existing-file","title":"Adding all changes in an existing file","text":"

        If you've made changes to an existing file, you can include all of these changes in the next commit by running:

        git add filename\n
        "},{"location":"using-git/creating-a-commit/#adding-some-changes-in-an-existing-file","title":"Adding some changes in an existing file","text":"

        If you've made changes to an existing file and only want to include some of these changes in the next commit, you can select the changes to include by running:

        git add -p filename\n

        This will show you each of the changes in turn, and allow you select which ones to stage.

        Tip

        This interactive selection mode is very flexible; you can enter ? at any of the prompts to see the range of available actions.

        "},{"location":"using-git/creating-a-commit/#renaming-a-file","title":"Renaming a file","text":"

        If you want to rename a file, you can use git mv to rename the file and stage this change for inclusion in the next commit:

        git mv filename newname\n
        "},{"location":"using-git/creating-a-commit/#removing-a-file","title":"Removing a file","text":"

        If you want to remove a file, you can use git rm to remove the file and stage this change for inclusion in the next commit:

        git rm filename\n

        Tip

        If the file has any uncommitted changes, git will refuse to remove the file. You can override this behaviour by running:

        git rm --force filename\n
        "},{"location":"using-git/creating-a-commit/#inspecting-the-staged-changes","title":"Inspecting the staged changes","text":"

        To verify that you have staged all of the desired changes, you can view the staged changes by running:

        git diff --cached\n

        You can view the staged changes for a specific file by running:

        git diff --cached filename\n
        "},{"location":"using-git/creating-a-commit/#undoing-a-staged-change","title":"Undoing a staged change","text":"

        You may sometimes stage a change for inclusion in the next commit, but decide later that you don't want to include it in the next commit. You can undo staged changes to a file by running:

        git restore --staged filename\n

        Note

        This will not modify the contents of the file.

        "},{"location":"using-git/creating-a-commit/#creating-a-new-commit","title":"Creating a new commit","text":"

        Once you have staged all of the changes that you want to include in the commit, create the commit by running:

        git commit\n

        This will open your chosen editor and prompt you to write the commit message.

        Tip

        Note that the commit will not be created until you exit the editor.

        If you decide that you don't want to create the commit, you can abort this action by closing your editor without saving a commit message.

        Please see Choosing your Git editor for details.

        "},{"location":"using-git/creating-a-commit/#modifying-the-most-recent-commit","title":"Modifying the most recent commit","text":"

        After you create a commit, you might decide that there are other changes that should be included in the commit. Git provides a simple way of modifying the most recent commit.

        Warning

        Do not modify the commit if you have already pushed it to another repository. Instead, record a new commit that includes the desired changes.

        Remember that your commit history should not be a highly-edited, polished view of your work, but should instead act as a lab book.

        Do not worry about creating \"perfect\" commits!

        To modify the most recent commit, stage the changes that you want to commit (see the sections above) and add them to the most recent commit by running:

        git commit --amend\n

        This will open your chosen editor and allow you to modify the commit message.

        "},{"location":"using-git/creating-a-remote-repository/","title":"Creating a remote repository","text":"

        Once you have created a \"local\" repository (i.e., a repository that exists on your own computer), it is generally a good idea to create a \"remote\" repository. You may choose to store this remote repository on a service such as GitHub, or on a University-provided platform.

        If you are using GitHub, you can choose to create a public repository (viewable by anyone, but you control who can make changes) or a private repository (you control who can view and/or make changes).

        "},{"location":"using-git/creating-a-remote-repository/#linking-your-local-and-remote-repositories","title":"Linking your local and remote repositories","text":"

        Once you have created the remote repository, you need to link it to your local repository. This will allow you to \"push\" commits from your local repository to the remote repository, and to \"pull\" commits from the remote repository to your local repository.

        Note

        When you create a new repository on services such as GitHub, they will give you instructions on how to link this new repository to your local repository. We also provide an example, below.

        A repository can be linked to more than one remote repository, so we need to choose a name to identify this remote repository.

        Info

        The name \"origin\" is commonly used to identify the main remote repository.

        In this example, we link our local repository to the remote repository for this book (https://github.com/robmoss/git-is-my-lab-book) with the following command:

        git remote add origin git@github.com:robmoss/git-is-my-lab-book.git\n

        Note

        Notice that the URL is similar to, but not identical to, the URL you use to view the repository in your web browser.

        "},{"location":"using-git/creating-a-repository/","title":"Creating a repository","text":"

        You can create repositories by running git init. This will create a .git directory that will contain all of the repository information.

        There are two common ways to use git init:

        1. Create an empty repository in the current directory, by running:

          git init\n
        2. Create an empty repository in a specific directory, by running:

          git init path/to/repository\n

          Info

          Git will create the repository directory if it does not exist.

        "},{"location":"using-git/exercise-create-a-local-repository/","title":"Exercise: create a local repository","text":"

        In this exercise you will create a local repository, and use this repository to create multiple commits, switch between branches, and inspect the repository history.

        1. Create a new, empty repository in a directory called git-exercise.

        2. Create a README.md file and write a brief description for this repository. Record the contents of README.md in a new commit, and write a commit message.

        3. Write a script that generates a small data set, and saves the data to a CSV file. For example, this script could sample values from a probability distribution with fixed shape parameters. Explain how to use this script in README.md. Record your changes in a new commit.

        4. Write a script that plots these data, and saves the figure in a suitable file format. Explain how to use this script in README.md. Record your changes in a new commit.

        5. Add a tag milestone-1 to the commit you created in the previous step.

        6. Create a new branch called feature/new-data. Check out this branch and modify the data-generation script so that it produces new data and/or more data. Record your changes in one or more new commits.

        7. Create a new branch called feature/summarise from the tag you created in step #5. Check out this branch and modify the plotting script so that it also prints some summary statistics of the data. Record your changes in one or more new commits.

        8. In your main or master branch, and add a license. Record your changes in a new commit.

        9. In your main or master branch, merge the two feature branches created in steps #6 and #7, and add a new tag milestone-2.

        "},{"location":"using-git/exercise-create-a-local-repository/#self-evaluation","title":"Self evaluation","text":"

        Now that you have started a repository, created commits in multiple branches, and merged these branches, here are some questions for you to consider:

        • Have you committed the generated data file and/or the plot figure?

        • If you haven't committed either or both of these files, have you instructed git to ignore them?

        • Did you add a meaningful description to each milestone tag?

        • How many commits modified your data-generation script?

        • How many commits modified your plotting script?

        • What changes, if any, were made to README.md since it was first created?

        Tip

        To answer some of these questions, you may need to run git commands.

        "},{"location":"using-git/exercise-resolve-a-merge-conflict/","title":"Exercise: resolve a merge conflict","text":"

        We have created a public repository that you can use to try resolving a merge conflict yourself. This repository includes some example data and a script that performs some basic data analysis.

        First, obtain a local copy (a \"clone\") of this repository by running:

        git clone https://github.com/robmoss/gimlb-simple-merge-example.git\ncd gimlb-simple-merge-example\n
        "},{"location":"using-git/exercise-resolve-a-merge-conflict/#the-repository-history","title":"The repository history","text":"

        You can inspect the repository history by running git log. Some key details to notice are:

        1. The first commit created the following files:
        2. README.md
        3. LICENSE
        4. analysis/initial_exploration.R
        5. input_data/data.csv

        6. The second commit created the following file:

        7. outputs/summary.csv

        This commit has been given the tag first_milestone.

        1. From this first_milestone tag, two branches were created:

        2. The feature/second-data-set branch adds a second data set and updates the analysis script to inspect both data sets.

        3. The feature/calculate-rate-of-change branch changes which summary statistics are calculated for the original data set.

        4. The example-solution branch merges both feature branches and resolves any merge conflicts. This branch has been given the tag second_milestone.

        "},{"location":"using-git/exercise-resolve-a-merge-conflict/#your-task","title":"Your task","text":"

        You will start with the master branch, which contains the commits up to the first_milestone tag, and then merge the two feature branches into this branch, resolving any merge conflicts that arise. You can then compare your results to the example-solution branch.

        1. Obtain a local copy of this repository, by running:

          git clone https://github.com/robmoss/gimlb-simple-merge-example.git\ncd gimlb-simple-merge-example\n
        2. Create local copies of the two feature branches and the example solution, by running:

          git checkout feature/second-data-set\ngit checkout feature/calculate-rate-of-change\ngit checkout example-solution\n
        3. Return to the master branch, by running:

          git checkout master\n
        4. Merge the feature/second-data-set branch into master, by running:

          git merge feature/second-data-set\n
        5. Merge the feature/calculate-rate-of-change branch into master, by running:

          git merge feature/calculate-rate-of-change\n

        This will result in a merge conflict, and now you need to decide how to resolve each conflict! Once you have resolved the conflicts, create a commit that records all of your changes (see the previous chapter for an example).

        Tip

        You may find it helpful to inspect the commits in each of the feature branches to understand how they have changed the files in which the conflicts have occurred.

        "},{"location":"using-git/exercise-resolve-a-merge-conflict/#self-evaluation","title":"Self evaluation","text":"

        Once you have created a commit that resolves these conflicts, see how similar or different the contents of your commit are to the corresponding commit in the example-solution branch (which has been tagged second_milestone). You can inspect this commit by running:

        git show example-solution\n

        You can compare this commit to your solution by running:

        git diff example-solution\n

        How does your resolution compare to this commit?

        Note

        You may have resolved the conflicts differently to the example-solution branch, and that's perfectly fine as long as they have the same effect.

        "},{"location":"using-git/exercise-resolve-a-merge-conflict/#example-solution","title":"Example solution","text":"

        Here we present a recorded terminal session in which we clone this repository and resolve the merge conflict.

        Tip

        You can use the video timeline (below) to jump to specific moments in this exercise. Remember that you can pause the recording at any point to select and copy any of the text.

        Resolving a merge conflict

        Video timeline:

        1. Start: a quick look around
        2. Create local copies of branches
        3. Inspect the feature/second-data-set branch
        4. Inspect the feature/calculate-rate-of-change branch
        5. Merge the feature/second-data-set branch
        6. Merge the feature/calculate-rate-of-change branch
        7. Resolve the merge conflicts
        8. Compare to the example solution
        "},{"location":"using-git/exercise-use-a-remote-repository/","title":"Exercise: use a remote repository","text":"

        In this exercise, you will use a remote repository to synchronise and merge changes between multiple local repositories, starting from the local git-exercise repository that you created in the previous exercise.

        "},{"location":"using-git/exercise-use-a-remote-repository/#create-a-remote-repository","title":"Create a remote repository","text":"
        1. Create a new remote repository on a platform such as GitHub. You can make this a private repository, because you won't need to share it with anyone.

        2. Link your local git-exercise repository to this remote repository, and push all branches and tags to this remote repository.

        "},{"location":"using-git/exercise-use-a-remote-repository/#clone-the-remote-repository","title":"Clone the remote repository","text":"
        1. Make a local copy of this remote repository called git-exercise-2.

        2. Check out the main or master branch. The files should be identical to the milestone-2 tag in your original git-exercise repository.

        "},{"location":"using-git/exercise-use-a-remote-repository/#work-on-the-new-local-repository","title":"Work on the new local repository","text":"
        1. Create a new branch called feature/report. Check out this branch and create a new file called report.md. Edit this file so that it contains:

        2. A brief description of the generated data set;

        3. A table of the summary statistics printed by the plotting scripting (see the Markdown Guide); and
        4. The figure produced by the plotting script (see the Markdown Guide).

        Record your changes in a new commit.

        1. Push this new branch to the remote repository.
        "},{"location":"using-git/exercise-use-a-remote-repository/#merge-the-report-into-the-original-repository","title":"Merge the report into the original repository","text":"
        1. In your original git-exercise repository, checkout the feature/report branch from the remote repository and verify that it now contains the file report.md.

        2. Merge this branch into your main or master branch, and add a new tag milestone-3-report.

        3. Push the updated main or master branch to the remote repository.

        "},{"location":"using-git/exercise-use-a-remote-repository/#update-the-new-local-repository","title":"Update the new local repository","text":"
        1. In your git-exercise-2 repository, checkout the main or master branch and pull changes from the remote repository. It should now contain the file report.md.

        Info

        Congratulations! You have used a remote repository to synchronise and merge changes between two local repositories. You can use this workflow to collaborate with colleagues.

        "},{"location":"using-git/exercise-use-a-remote-repository/#self-evaluation","title":"Self evaluation","text":"

        Now that you have used commits and branches to share work between multiple repositories, here are some questions for you to consider:

        • Do you feel comfortable in deciding which changes to record in a single commit?

        • Do you feel that your commit messages help describe the changes that you have made in this repository?

        • Do you feel comfortable in using multiple branches to work on separate ideas in parallel?

        • Do you have any current projects that you might want to work on using local and remote git repositories?

        "},{"location":"using-git/first-time-git-setup/","title":"First-time Git setup","text":"

        Once you've installed Git, you should define some important settings before you starting using Git.

        Info

        We assume that you will want to set the git configuration for all repositories owned by your user. Therefore, we use the --global flag. Configuration files can be set for a single repository or the whole computer by replacing --global with --local or --system respectively.

        1. Define your user name and email address. These details are included in every commit that you create.

          git config --global user.name \"My Name\"\ngit config --global user.email \"my.name@some-university.edu.au\"\n
          2. Define the text editor that Git should use for tasks such as writing commit messages:

          git config --global core.editor editor-name\n

          NOTE: on Windows you need to specify the full path to the editor:

          git config --global core.editor \"C:/Program Files/My Editor/editor.exe\"\n

          Tip

          Please see Choosing your Git editor for details.

        2. By default, Git will create a branch called master when you create a new repository. You can set a different name for this initial branch:

          git config --global init.defaultBranch main\n
        3. Ensure that repository histories always record when branches were merged:

          git config --global merge.ff no\n

          This prevents Git from \"fast-forwarding\" when the destination branch contains no new commits. For example, it ensures that when you merge the green branch into the blue branch (as shown below) it records that commits D, E, and F came from the green branch.

        4. Adjust how Git shows merge conflicts:

          git config --global merge.conflictstyle diff3\n

          This will be useful when we look at how to use branches and how to resolve merge conflicts.

        Info

        If you use Windows, there are tools that can improve your Git experience in PowerShell.

        There are also tools for integrating Git into many common text editors. See Git in other environments, Appendix A of the Pro Git book.

        "},{"location":"using-git/graphical-git-clients/","title":"Graphical Git clients","text":"

        While Git is a command-line program, there are other ways to work with Git repositories:

        • There are many graphical clients that you can download and use;

        • Many editors include Git support (e.g., Atom, RStudio, Visual Studio Code); and

        • Online platforms such as GitHub, GitLab, and Bitbucket also provide a graphical interface for common Git actions.

        In this book we will primarily show how to use Git from the command-line, but all of the concepts and terminology should also apply to all of the tools described above. If you don't have Git already installed on your computer, see these instructions for installing Git.

        "},{"location":"using-git/how-to-create-and-use-tags/","title":"How to create and use tags","text":"

        Tags allow you to bookmark important points in your commit history.

        You can use tags to identify milestones such as:

        • Adding specific features to your model or data analysis (e.g., feature-age-dependent-mixing);
        • Completing objectives in your research plan (e.g., objective-1, objective-2);
        • Completed manuscript drafts (e.g., draft-1, draft-2); and
        • Manuscript submission and revisions (e.g., submitted, revision-1).
        "},{"location":"using-git/how-to-create-and-use-tags/#tagging-the-current-commit","title":"Tagging the current commit","text":"

        You can add a tag (in this example, \"my-tag\") to the current commit by running:

        git tag -a my-tag\n

        This will open your chosen editor and ask you to write a description for this tag.

        "},{"location":"using-git/how-to-create-and-use-tags/#pushing-tags-to-a-remote-repository","title":"Pushing tags to a remote repository","text":"

        By default, git push doesn't push tags to remote repositories. Instead, you have to explicitly push tags. You can push a tag (in this example, called \"my-tag\") to a remote repository (in this example, called \"origin\") by running:

        git push origin my-tag\n

        You can push all of your tags to a remote repository (in this example, called \"origin\") by running:

        git push origin --tags\n
        "},{"location":"using-git/how-to-create-and-use-tags/#tagging-a-past-commit","title":"Tagging a past commit","text":"

        To add a tag to a previous commit, you can identify the commit by its hash. For example, you can inspect your commit history by running:

        git log --oneline --no-decorate\n

        If your commit history looks like:

        003cf6b Show how to ignore certain files\n339eb5a Show how to prepare and record commits\n6a7fb8b Show how to clone remote repositories\n...\n
        where the current commit is 003cf6b (\"Show how to ignore certain files\"), you can tag the previous commit (\"Show how to prepare and record commits\") by running:

        git tag -a my-tag 339eb5a\n
        "},{"location":"using-git/how-to-create-and-use-tags/#listing-tags","title":"Listing tags","text":"

        You can list all tags by running:

        git tag\n

        You can also list only tags that match a specific pattern (in this example, all tags beginning with \"my\") by running:

        git tag --list 'my*'\n
        "},{"location":"using-git/how-to-create-and-use-tags/#deleting-tags","title":"Deleting tags","text":"

        You can delete a tag by running:

        git tag --delete my-tag\n
        "},{"location":"using-git/how-to-create-and-use-tags/#creating-a-branch-from-a-tag","title":"Creating a branch from a tag","text":"

        You can check out a tag and begin working on a new branch by running:

        git checkout -b my-branch my-tag\n
        "},{"location":"using-git/how-to-ignore-certain-files/","title":"How to ignore certain files","text":"

        Your repository may contain files that you don't want to include in your commit history. For example, you may not want to include files of the following types:

        • Sensitive data files for which access must be strictly controlled.
        • Temporary files that do not contain useful information, such as:
        • .aux files, which are generated when compiling LaTeX documents; and
        • .pyc files, which are generated when running Python code.
        • Files that can be automatically generated from your commit history, such as:
        • .pdf versions of LaTeX documents; and
        • documentation generated from your code files.

        You can instruct Git to ignore certain files by creating a .gitignore file. This is a plain text file, where each line defines a pattern that identifies files and directories which should be ignored. You can also add comments, which must start with a #, to explain the purpose of these patterns.

        Tip

        If your editor will not accept .gitignore as a file name, you can create a .gitignore file in your repository by running:

        touch .gitignore\n

        For example, the following .gitignore file would make Git ignore all .aux and .pyc files, and the file my-paper.pdf:

        # Ignore all .aux files generated by LaTeX.\n*.aux\n# Ignore all byte-code files generated by Python.\n*.pyc\n# Ignore the PDF version of my paper.\nmy-paper.pdf\n

        If you have sensitive data files, one option is to store them all in a dedicated directory and add this directory to your .gitignore file:

        # Ignore all data files in the \"sensitive-data\" directory.\nsensitive-data\n

        Tip

        You can force Git to add an ignored file to a commit by running:

        git add --force my-paper.pdf\n

        But it would generally be better to update your .gitignore file so that it stops ignoring these files.

        "},{"location":"using-git/how-to-resolve-merge-conflicts/","title":"How to resolve merge conflicts?","text":"

        A merge conflict can occur when we try to merge one branch into another, if the two branches introduce any conflicting changes.

        For example, consider trying to merge two branches that make the following changes to the same line of the file test.txt:

        1. On the branch my-new-branch:

           First line\n-Second line\n+My new second line\n Third line\n
        2. On the main branch:

           First line\n-Second line\n+A different second line\n Third line\n

        When we attempt to merge my-new-branch into the main branch, git merge my-new-branch will tell us:

        Auto-merging test.txt\nCONFLICT (content): Merge conflict in test.txt\nAutomatic merge failed; fix conflicts and then commit the result.\n

        The test.txt file will now include the conflicting changes, which we can inspect with git diff:

        diff --cc test.txt\nindex 18712c4,bc576a6..0000000\n--- a/test.txt\n+++ b/test.txt\n@@@ -1,3 -1,3 +1,7 @@@\n  First line\n++<<<<<<< ours\n +A different second line\n++=======\n+ My new second line\n++>>>>>>> theirs\n  Third line\n

        Note that this two-day diff shows:

        1. \"our\" changes: from the commits on the branch that we are merging into; and
        2. \"their\" changes: from the commits on the branch that we are merging from.

        Each conflict is surrounded by <<<<<<< and >>>>>>> markers, and the conflicting changes are separated by a ======= marker.

        If we instruct Git to use a three-way diff (see first-time Git setup), the conflict will be reported slightly differently:

        diff --cc test.txt\nindex 18712c4,bc576a6..0000000\n--- a/test.txt\n+++ b/test.txt\n@@@ -1,3 -1,3 +1,7 @@@\n  First line\n++<<<<<<< ours\n +A different second line\n++||||||| base\n++Second line\n++=======\n+ My new second line\n++>>>>>>> theirs\n  Third line\n

        In addition to showing \"our\" changes and \"their changes\", this three-way diff also shows the original lines, between the ||||||| and ======= markers. This extra information can help you decide how to best resolve the conflict.

        "},{"location":"using-git/how-to-resolve-merge-conflicts/#resolving-the-conflicts","title":"Resolving the conflicts","text":"

        We can edit test.txt to reconcile these changes, and the commit our fix. For example, we might decide that test.txt should have the following contents:

        First line\nThe corrected second line\nThird line\n

        We can then commit these changes to resolve the merge conflict:

        git add test.txt\ngit commit -m \"Resolved the merge conflict\"\n
        "},{"location":"using-git/how-to-resolve-merge-conflicts/#cancelling-the-merge","title":"Cancelling the merge","text":"

        Alternatively, you may decide you don't want to merge these two branches, in which case you cancel the merge by running:

        git merge --abort\n
        "},{"location":"using-git/how-to-structure-a-repository/","title":"How to structure a repository","text":"

        While there is no single \"best\" way to structure a repository, there are some guidelines that you can follow. The key aims are to ensure that your files are logically organised, and that others can easily navigate the repository.

        "},{"location":"using-git/how-to-structure-a-repository/#divide-your-repository-into-multiple-directories","title":"Divide your repository into multiple directories","text":"

        It is generally a good idea to have separate directories for different types of files. For example, your repository might contain any of these different file types, and you should at least consider storing each of them in a separate directory:

        • Input data files (which you may have received from a collaborator);
        • Cleaned and/or processed input files (e.g., if you aggregate the input data before using it);
        • Data analysis code;
        • Simulation/model code;
        • Output data files;
        • Plotting scripts that extract results from the output data files;
        • Output figures produced by the plotting scripts; and
        • Manuscript text and bibliography files.
        "},{"location":"using-git/how-to-structure-a-repository/#use-descriptive-names-for-directories-and-files","title":"Use descriptive names for directories and files","text":"

        Choosing file names that indicate what each file/directory contains can help other people, such as your collaborators, navigate your repository. They can also help you when you return to a project after several weeks or months.

        Tip

        Have you ever asked yourself \"where is the file that contains X\"?

        Use descriptive file names, and the answer might be right in front of you!

        "},{"location":"using-git/how-to-structure-a-repository/#include-a-readme-file","title":"Include a README file","text":"

        You can write this in Markdown (README.md), in plain text (README or README.txt), or in any other suitable format. For example, Python projects often use reStructuredText and have a README.rst file.

        This file should begin with a brief description of why the repository was created and what it contains.

        Importantly, this file should also mention:

        • How the files and directories are arranged. Help your collaborators understand where they need to look in order to find something.

        • How to run important pieces of code (e.g., to generate output data files or figures).

        • The software packages and/or libraries that are required run any of the code in this repository.

        • The license (if any) under which the repository contents are being made available.

        "},{"location":"using-git/how-to-use-branches/","title":"How to use branches?","text":"

        Recall that branches allow you to work on different ideas or tasks in parallel, within a single repository. In this chapter, we will show you how create and use branches. In the Collaborating section, we will show you how branches can allow multiple people to work together on code and papers, and how you can use branches for peer code review.

        Info

        Branches, like tags, are identified by name. Common naming conventions include:

        • feature/some-new-thing for adding something new (a new data analysis, a new model feature, etc); and
        • bugfix/some-problem for fixing something that isn't working as intended (e.g., perhaps there's a mistake in a data analysis script).

        You can choose your own conventions, but make sure that you choose meaningful names.

        Do not use names like branch1, branch2, etc.

        "},{"location":"using-git/how-to-use-branches/#creating-a-new-branch","title":"Creating a new branch","text":"

        You can create a new branch (in this example, called \"my-new-branch\") that starts from the current commit by running:

        git checkout -b my-new-branch\n

        You can also create a new branch that starts from a specific commit, tag, or branch in your repository:

        git checkout -b my-new-branch 95eaae5          # From an existing commit\ngit checkout -b my-new-branch my-tag-name      # From an existing tag\ngit checkout -b my-new-branch my-other-branch  # From an existing branch\n

        You can then create a corresponding upstream branch in your remote repository (in this example, called \"origin\") by running:

        git push -u origin\n
        "},{"location":"using-git/how-to-use-branches/#working-on-a-remote-branch","title":"Working on a remote branch","text":"

        If there is a branch in your remote repository that you want to work on, you can make a local copy by running:

        git checkout remote-branch-name\n

        This will create a local branch with the same name (in this example, \"remote-branch-name\").

        "},{"location":"using-git/how-to-use-branches/#listing-branches","title":"Listing branches","text":"

        You can list all of the branches in your repository by running:

        git branch\n

        This will also highlight the current branch.

        "},{"location":"using-git/how-to-use-branches/#switching-between-branches","title":"Switching between branches","text":"

        You can switch from your current branch to another branch (in this example, called \"other-branch\") by running:

        git checkout other-branch\n

        Info

        Git will not let you switch branches if you have any uncommitted changes.

        One way to avoid this issue is to record the current changes as a new commit, and explain in the commit message that this is a snapshot of work in progress.

        A second option is to discard the uncommitted changes to each file by running:

        git restore file1 file2 file3 ... fileN\n
        "},{"location":"using-git/how-to-use-branches/#pushing-and-pulling-commits","title":"Pushing and pulling commits","text":"

        Once you have created a branch, you can use git push to \"push\" your commits to the remote repository, and git pull to \"pull\" commits from the remote repository. See Pushing and pulling commits for details.

        "},{"location":"using-git/how-to-use-branches/#inspecting-branch-histories","title":"Inspecting branch histories","text":"

        You can use git log to inspect the commit history of any branch:

        git log branch-name\n

        Remember that there are many ways to control what git log will show you.

        Similarly, you can use git diff to compare the changes in any two branches:

        git diff first-branch second-branch\n

        Again, there are ways to control what git diff will show you.

        "},{"location":"using-git/how-to-use-branches/#merging-branches","title":"Merging branches","text":"

        You may reach a point where you want to incorporate the changes from one branch into another branch. This is referred to as \"merging\" one branch into another, and is illustrated in the What is a branch? chapter.

        For example, you might have completed a new feature for your model or data analysis, and now want to merge this back into your main branch.

        First, ensure that the current branch is the branch you want to merge the changes into (this is often your main or master branch). You can them merge the changes from another branch (in this example, called \"other-branch\") by running:

        git merge other-branch\n

        This can have two different results:

        1. The commits from other-branch were merged successfully into the current branch; or

        2. There were conflicting changes (referred to as a \"merge conflict\").

        In the next chapter we will show you how to resolve merge conflicts.

        "},{"location":"using-git/inspecting-your-history/","title":"Inspecting your history","text":"

        You can inspect your commit history at any time with the git log command. By default, this command will list every commit from the very first commit to the current commit, and for each commit it will show you:

        • The commit identifier (\"hash\"), which uniquely identifies this commit;
        • The person who created the commit (\"author\");
        • The date on which the commit was created; and
        • The commit message.

        There are many ways to adjust which commits and what details that git log will show.

        Tip

        Each commit has a unique identifier (\"hash\"). These hashes are quite long, but in general you only need to provide the first 5-7 digits to uniquely identify a specific commit.

        "},{"location":"using-git/inspecting-your-history/#listing-commits-over-a-specific-time-interval","title":"Listing commits over a specific time interval","text":"

        You can limit which commits git log will show by specifying a start time and/or an end time.

        Tip

        This can be extremely useful for generating progress reports and summarising your recent activity in team meetings.

        For example, you can view commits from the past week by running:

        git log --since='7 days'\ngit log --since='1 week'\n

        You can view commits made between 1 and 2 weeks ago by running:

        git log --since='2 weeks' --until='1 week'\n

        You can view commits made between specific dates by running:

        git log --since='2022/05/12' --until='2022/05/14'\n
        "},{"location":"using-git/inspecting-your-history/#listing-commits-that-modify-a-specific-file","title":"Listing commits that modify a specific file","text":"

        You can see which commits have made changes to a file by running:

        git log -- filename\n

        Info

        Note the -- argument that comes before the file name. This ensures that if the file name begins with a -, git log will not treat the file name as an option.

        "},{"location":"using-git/inspecting-your-history/#changing-how-commits-are-displayed","title":"Changing how commits are displayed","text":"

        You can make git log display only the first 7 digits of each commit hash, and the first line of each commit message, by running:

        git log --oneline\n

        This can be a useful way to get a quick overview of the recent history.

        "},{"location":"using-git/inspecting-your-history/#viewing-the-contents-of-a-single-commit","title":"Viewing the contents of a single commit","text":"

        You can identify a commit by its unique identifier (\"hash\") or by its tag name (if it has been tagged), and view the commit with git show:

        git show commit-hash\ngit show tag-name\n

        This will show the commit details and all of the changes that were recorded in this commit.

        Tip

        By default, git show will show you the most recent commit.

        "},{"location":"using-git/inspecting-your-history/#viewing-all-changes-over-a-specific-interval","title":"Viewing all changes over a specific interval","text":"

        You can view all of the changes that were made between two commits with the git diff command.

        Tip

        The git diff command shows the difference between two points in your commit history.

        Note that git diff does not support start and/or end times like git log does; you must use commit identifiers.

        For example, here is a subset of the commit history for [this book's repository](https://github.com/robmoss/git-is-my-lab-book):\n\n```text\n95eaae5 Note the need for a GitHub account and SSH key\n11085f0 Show how to create a branch from a tag\n9369482 Show how to create and use tags\n003cf6b Show how to ignore certain files\n339eb5a Show how to prepare and record commits\n6a7fb8b Show how to clone remote repositories\n6a49e10 Note that mdbook-admonish must be installed\na8e6114 Fixed the URL for the UoM GitLab instance\n5192704 Add a merge conflict exercise\n

        We can view all of the changes that were made after the bottom commit (5192704, \"Add a merge conflict exercise\") up to and including the top commit (95eaae5, \"Note the need for a GitHub account and SSH key\") by running:

        git diff 5192704..95eaae5\n

        In the above example, 8 files were changed, with a total of 310 new lines and 7 deleted lines. This is a lot of information! You can print a summary of these changes by running:

        git diff --stat 5192704..95eaae5\n

        This should show you the following details:

         README.md                                       |   2 +-\n src/SUMMARY.md                                  |   3 +\n src/prerequisites.md                            |   2 +\n src/using-git/cloning-an-existing-repository.md |  36 ++++++++++\n src/using-git/creating-a-commit.md              | 146 +++++++++++++++++++++++++++++++++++++--\n src/using-git/how-to-create-and-use-tags.md     |  89 ++++++++++++++++++++++++\n src/using-git/how-to-ignore-certain-files.md    |  37 ++++++++++\n src/version-control/what-is-a-repository.md     |   2 +-\n 8 files changed, 310 insertions(+), 7 deletions(-)\n

        This reveals that about half of the changes (146 new/deleted lines) were made to src/using-git/creating-a-commit.md.

        "},{"location":"using-git/inspecting-your-history/#viewing-changes-to-a-file-over-a-specific-interval","title":"Viewing changes to a file over a specific interval","text":"

        Similar to the git log command, you can limit the files that the git diff command will examine. For example, you can display only the changes made to README.md in the above example by running:

        git diff 5192704..95eaae5 -- README.md\n

        This should show you the following change:

        diff --git a/README.md b/README.md\nindex 7956b65..a34f907 100644\n--- a/README.md\n+++ b/README.md\n@@ -15,7 +15,7 @@ This work is licensed under a [Creative Commons Attribution-ShareAlike 4.0 Inter\n\n ## Building the book\n\n-You can build this book by installing [mdBook](https://rust-lang.github.io/mdBook/) and running the following command in this directory:\n+You can build this book by installing [mdBook](https://rust-lang.github.io/mdBook/) and [mdbook-admonish](https://github.com/tommilligan/mdbook-admonish/), and running the following command in this directory:\n\n ```shell\n mdbook build\n
        "},{"location":"using-git/pushing-and-pulling-commits/","title":"Pushing and pulling commits","text":"

        In general, we \"push\" commits from our local repository to a remote repository by running:

        git push <remote-repository>\n

        and \"pull\" commits from a remote repository into our local repository by running:

        git pull <remote-repository>\n

        where <remote-repository> is either a URL or the name of a remote repository.

        However, we generally want to push to, and pull from, the same remote repository every time. See the next section for an example of linking the main branch in your local repository with a corresponding \"upstream\" branch in your remote repository.

        "},{"location":"using-git/pushing-and-pulling-commits/#pushing-your-first-commit-to-a-remote-repository","title":"Pushing your first commit to a remote repository","text":"

        In order to push commits from your local repository to a remote repository, we need to create a branch in the remote repository that corresponds to the main branch of our local repository. This requires that you have created at least one commit in your local repository.

        Tip

        This is a good time to create a README.md file and write a brief description of what this repository will contain.

        Once you have at least one commit in your local repository, you can create a corresponding upstream branch in the remote repository with the following command:

        git push -u origin\n

        Note

        Recall that we identify remote repositories by name. In this example, the remote repository is call \"origin\". You can choose a different name when linking your local and remote repositories.

        Once you have defined the upstream branch, you can push commits by running:

        git push\n

        and pull commits by running:

        git pull\n

        without having to specify the remote repository.

        "},{"location":"using-git/pushing-and-pulling-commits/#forcing-updates-to-a-remote-repository","title":"Forcing updates to a remote repository","text":"

        By default, Git will refuse to push commits from a local branch to a remote branch if the remote branch contains any commits that are not in your local branch. This situation should not arise in general, and typically indicates that either someone else has pushed new commits to the remote branch (see the Collaborating section) or that you have altered the history of your local branch.

        If you are absolutely confident that your local history of commits should replace the contents of the remote branch, you can force this update by running:

        git push --force\n

        Tip

        Unless you are confident that you understand why this situation has occurred, it is probably a good idea to ask for advice before running the above command.

        "},{"location":"using-git/where-did-this-line-come-from/","title":"Where did this line come from?","text":"

        Consider the What should I commit? chapter. Imagine that we want to know when and why the following text was added:

        A helpful guideline is \"**commit early, commit often**\".\n

        If we can identify the relevant commit, we can then inspect the commit (using git show <commit>) to see all of the changes that it introduced. Ideally, the commit message will explain the reasons why this commit was made. This is one way in which your commit messages can act as a lab book.

        At the time of writing (commit 2a96324), the contents of the What should I commit? came from two commits:

        git log --oneline src/version-control/what-should-I-commit.md\n
        3dfff1f Add notes about committing early and often\n9be780b Briefly describe key version control concepts\n

        We can use the git blame command to identify the commit that last modified each line in this file:

        git blame -s src/version-control/what-should-I-commit.md\n
        9be780b8  1) # What should I commit?\n9be780b8  2)\n9be780b8  3) A commit should represent a **unit of work**.\n9be780b8  4)\n9be780b8  5) If you've made changes that represent multiple units of work (e.g., changing how input data are processed, and adding a new model parameter) these should be saved as separate commits.\n9be780b8  6)\n9be780b8  7) Try describing out loud the changes you have made, and if you find yourself saying something like \"I did X and Y and Z\", then the changes should probably divided into multiple commits.\n3dfff1fe  8)\n3dfff1fe  9) A helpful guideline is \"**commit early, commit often**\".\n3dfff1fe 10)\n3dfff1fe 11) ## Commit early\n3dfff1fe 12)\n3dfff1fe 13) - Don't delay creating a commit because \"it's not ready yet\".\n3dfff1fe 14)\n3dfff1fe 15) - A commit doesn't have to be \"perfect\".\n3dfff1fe 16)\n3dfff1fe 17) ## Commit often\n3dfff1fe 18)\n3dfff1fe 19) - Small, focused commits are **extremely helpful** when trying to identify the cause of an unintended change in your code's behaviour or output.\n3dfff1fe 20)\n3dfff1fe 21) - There is no such thing as too many commits.\n

        You can see that the first seven lines were last modified by commit 9be780b (Briefly describe key version control concepts), while the rest of the file was last modified by commit 3dfff1f (Add notes about committing early and often). So the text that we're interested in (line 9) was introduced by commit 3dfff1f.

        You can inspect this commit by running the following command:

        git show 3dfff1f\n
        Video demonstration

        "},{"location":"using-git/where-did-this-problem-come-from/","title":"Where did this problem come from?","text":"

        Let's find the commit that created the file src/version-control/what-is-a-repository.md. We could find this out using git log, but the point here is to illustrate how to use a script to find the commit that causes any arbitrary change to our repository.

        Once the commit has been found, you can inspect it (using git show <commit>) to see all of the changes this commit introduced and the commit message that (hopefully) explains the reasons why this commit was made. This is one way in which your commit messages can act as a lab book.

        1. Create a Python script called my_test.py with the following contents:

          #!/usr/bin/env python3\nfrom pathlib import Path\nimport sys\n\nexpected_file = Path('src') / 'version-control' / 'what-is-a-repository.md'\n\nif expected_file.exists():\n    # This file is the \"new\" thing that we want to detect.\n    sys.exit(1)\nelse:\n    # The file does not exist, this commit is \"old\".\n    sys.exit(0)\n

          For reference, here is an equivalent R script:

          #!/usr/bin/Rscript --vanilla\n\nexpected_file <- file.path('src', 'version-control', 'what-is-a-repository.md')\n\nif (file.exists(expected_file)) {\n    # This file is the \"new\" thing that we want to detect.\n    quit(status = 1)\n} else {\n    # The file does not exist, this commit is \"old\".\n    quit(status = 0)\n}\n
        2. Select the commit range over which to search. We know that the file exists in the commit 3dfff1f (Add notes about committing early and often), and it did not exist in the very first commit (5a19b02).

        3. Instruct Git to start searching with the following command:

          git bisect start 3dfff1f 5a19b02\n

          Note that we specify the newer commit first, and then the older commit.

          Git will inform you about the search progress, and which commit is currently being investigated.

          Bisecting: 7 revisions left to test after this (roughly 3 steps)\n[92f1375db21dd8a35ca141365a477b963dbbf6dc] Add CC-BY-SA license text and badge\n
        4. Instruct Git to use the script my_test.py to check each commit with the following command:

          git bisect run ./my_test.py\n

          It will continue to report the search progress and automatically identify the commit that we're looking for:

          running  './my_test.py'\nBisecting: 3 revisions left to test after this (roughly 2 steps)\n[9be780b8785d67ee191b2c0b113270059c9e0c3a] Briefly describe key version control concepts\nrunning  './my_test.py'\nBisecting: 1 revision left to test after this (roughly 1 step)\n[055906f28da146a2d012b7c1c0e4707503ed1b11] Display example commit message as plain text\nrunning  './my_test.py'\nBisecting: 0 revisions left to test after this (roughly 0 steps)\n[1251357ab5b41d511deb48cd5386cae37eec6751] Rename the \"What is a repository?\" source file\nrunning  './my_test.py'\n1251357ab5b41d511deb48cd5386cae37eec6751 is the first bad commit\ncommit 1251357ab5b41d511deb48cd5386cae37eec6751\nAuthor: Rob Moss <robm.dev@gmail.com>\nDate:   Sun Apr 17 21:41:43 2022 +1000\n\n    Rename the \"What is a repository?\" source file\n\n    The file name was missing the word \"a\" and did not match the title.\n\n src/SUMMARY.md                              |  2 +-\n src/version-control/what-is-a-repository.md | 18 ++++++++++++++++++\n src/version-control/what-is-repository.md   | 18 ------------------\n 3 files changed, 19 insertions(+), 19 deletions(-)\n create mode 100644 src/version-control/what-is-a-repository.md\n delete mode 100644 src/version-control/what-is-repository.md\n
        5. To quit the search and return to your current commit, run the following command:

          git bisect reset\n
        6. You can then inspect this commit by running the following command:

          git show 1251357\n
        "},{"location":"version-control/","title":"Version control concepts","text":"

        This section provides a high-level introduction to the concepts that you should understand in order to make effective use of version control.

        Info

        Version control can turn your files into a lab book that captures the broader context of your research activities and that you can easily search and reproduce.

        "},{"location":"version-control/exercise-using-version-control/","title":"Exercise: using version control","text":"

        In this section we have introduced version control, and outlined how it can be useful for academic research activities, including:

        • Capturing a detailed, annotated record of your research;
        • Inspecting changes made between any two moments in time;
        • Identifying when a specific change was made; and
        • Sharing your research with collaborators.

        Info

        We'd now like you think about how version control might be useful to you and your research.

        Have you experienced any issues or challenges in your career where version control would have been helpful? For example:

        • Have you ever looked at some of your older code and had difficulty understanding what it is doing, how it works, or why it was written?

        • Have you ever had difficulties identifying what code and/or data were used to generate a particular analysis or output?

        • Have you ever discovered a bug in your code and tried to identify when it was introduced, or what outputs it might have affected?

        • When collaborating on a research project, have you ever had challenges in making sure that everyone was working with the most recent files?

        How can you use version control in your current research project(s)?

        • Do you have an existing project or piece of code that could benefit from being stored in a repository?

        • Have you recently written any code that could be recorded as one or more commits?

        • If so, what would you write for the commit messages?

        • Have you written some exploratory code or analysis that could be stored in a separate branch?

        Having looked at the use of version control in the past and present, how would using version control benefit you?

        "},{"location":"version-control/how-do-I-write-a-commit-message/","title":"How do I write a commit message?","text":"

        Commit messages are shown as part of the repository history (e.g., when running git log). Each message consists of a short one-line description, followed by as much or as little text as required.

        You should treat these messages as entries in a log book. Explain what changes were made and why they were made. This can help collaborators understand what we have done, but more importantly is acts as a record for our future selves.

        Info

        Have you ever looked at code you wrote a long time ago and wondered what you were thinking?

        A history of detailed commit messages should allow you to answer this question!

        Remember that code is harder to read than it is to write (Joel Spolsky).

        For example, rather than writing:

        Added model

        You could write something like:

        Implemented the initial model

        This model includes all of the core features that we need to fit the data, but there several other features that we intend to add:

        - Parameter X is currently constant, but we may need to allow it to vary over time;

        - Parameter Y should probably be a hyperparameter; and

        - The population includes age-structured mixing, but we need to also include age-specific outcomes, even though there is very little data to suggest what the age effects might be.

        "},{"location":"version-control/what-is-a-branch/","title":"What is a branch?","text":"

        A branch allows you create a series of commits that are separate from the main history of your repository. They can be used for units of work that are too large to be a single commit.

        Info

        It is easy to switch between branches! You can work on multiple ideas or tasks in parallel.

        Consider a repository with three commits: commit A, followed by commit B, followed by commit C:

        At this point, you might consider two ways to implement a new model feature. One way to do this is to create a separate branch for each implementation:

        You can work on each branch, and switch between them, in the same local repository.

        If you decide that the first implementation (the green branch) is the best way to proceed, you can then merge this branch back into your main branch. This means that your main branch now contains six commits (A to F), and you can continue adding new commits to your main branch:

        "},{"location":"version-control/what-is-a-commit/","title":"What is a commit?","text":"

        A \"commit\" is a set of changes to one or more files in a repository. These changes can include:

        • Adding lines to a file;
        • Removing lines from a file;
        • Changing lines in a file;
        • Adding new files; and
        • Deleting existing files.

        Each commit also includes the date and time that it was created, the user that created it, and a commit message.

        "},{"location":"version-control/what-is-a-merge-conflict/","title":"What is a merge conflict?","text":"

        In What is a branch? we presented an example of successfully merging a branch into another. However, when we try to merge one branch into another, we may find that the two branches have conflicting changes. This is known as a merge conflict.

        Consider two branches that make conflicting changes to the same line of a file:

        1. Replace \"Second line\" with \"My new second line\":

           First line\n-Second line\n+My new second line\n Third line\n
        2. Replace \"Second line\" with \"A different second line\":

           First line\n-Second line\n+A different second line\n Third line\n

        There is no way to automatically reconcile these two branches, and we have to fix this conflict manually. This means that we need to decide what the true result should be, edit the file to resolve these conflicting changes, and commit our modifications.

        "},{"location":"version-control/what-is-a-repository/","title":"What is a repository?","text":"

        A repository records a set of files managed by a version control system, including the historical record of changes made to these files.

        You can create as many repositories as you want. Each repository should be a single \"thing\", such as a research project or a journal article, and should be located in a separate directory.

        You will generally have at least two copies of each repository:

        1. A local repository on your computer; and

        2. A remote repository on a service such as GitHub, or a University-provided platform (such as the University of Melbourne's GitLab instance).

        You make changes in your local repository and \"push\" them to the remote repository. You can share this remote repository with your collaborators and supervisors, and they will be able to see all of the changes that you have pushed.

        You can also allow collaborators to push their own changes to the remote repository, and then \"pull\" them into your local repository. This is one way in which you can use version control to work collaboratively on a project.

        "},{"location":"version-control/what-is-a-tag/","title":"What is a tag?","text":"

        A tag is a short, unique name that identifies a specific commit. You can use tags as bookmarks for interesting or important commits. Common uses of tags include:

        • Identifying manuscript revisions: draft-1, submitted-version, revision-1, etc.

        • Identifying software package versions: v1.0, v1.1, v2.0, etc.

        "},{"location":"version-control/what-is-version-control/","title":"What is version control?","text":"

        Version control is a way of systematically recording changes to files (such as computer code and data files). This allows you to restore any previous version of a file. More importantly, this history of changes can be queried, and each set of changes can include additional information, such as who made the changes and an explanation of why the changes were made.

        A core component of making great decisions is understanding the rationale behind previous decisions. If we don't understand how we got \"here\", we run the risk of making things much worse.

        \u2014 Chesterton's Fence

        For academic research activities that involve data analysis or simulation modelling, some key uses of version control are:

        • You can use it as a log book, and capture a detailed and permanent record of every step of your research. This is extremely helpful for people \u2014 including you! \u2014 who want to understand and make use of your work.

        • You can collaborate with others in a systematic way, ensuring that everyone has access to the most recent files and data, and review everyone's contributions.

        • You can inspect the changes made over a period of interest (e.g., \"What have I done in the last week?\").

        • You can identify when a specific change occurred, and what other changes were made at the same time (e.g., \"What changes did I make that affected this output figure?\").

        In this book we will focus on the Git version control system, which is used by popular online platforms such as GitHub, GitLab, and Bitbucket.

        "},{"location":"version-control/what-should-I-commit/","title":"What should I commit?","text":"

        A commit should represent a unit of work.

        If you've made changes that represent multiple units of work (e.g., changing how input data are processed, and adding a new model parameter) these should be saved as separate commits.

        Try describing out loud the changes you have made, and if you find yourself saying something like \"I did X and Y and Z\", then the changes should probably divided into multiple commits.

        A helpful guideline is \"commit early, commit often\".

        "},{"location":"version-control/what-should-I-commit/#commit-early","title":"Commit early","text":"
        • Don't delay creating a commit because \"it's not ready yet\".

        • A commit doesn't have to be \"perfect\".

        "},{"location":"version-control/what-should-I-commit/#commit-often","title":"Commit often","text":"
        • Small, focused commits are extremely helpful when trying to identify the cause of an unintended change in your code's behaviour or output.

        • There is no such thing as too many commits.

        "}]} \ No newline at end of file diff --git a/pr-preview/pr-65/sitemap.xml.gz b/pr-preview/pr-65/sitemap.xml.gz index 00f00dbe66e3c356454ed440e2fca021cfda8511..205876ab70566b5e54ee4bd1491cb1e6f226a71f 100644 GIT binary patch delta 15 Wcmcb_c8QHmzMF$X(0n7C4>JHGU<1DZ delta 15 Wcmcb_c8QHmzMF&NzVJpiA7%h2fCOm(