Skip to content

Commit

Permalink
differences for PR #13
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Nov 23, 2023
1 parent 319a00b commit a7c165e
Show file tree
Hide file tree
Showing 73 changed files with 1,978 additions and 552 deletions.
162 changes: 124 additions & 38 deletions 01-basics.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: Automated Version Control
teaching: 5
exercises: 0
teaching: 15
exercises: 10
---

::::::::::::::::::::::::::::::::::::::: objectives
Expand All @@ -17,20 +17,60 @@ exercises: 0

::::::::::::::::::::::::::::::::::::::::::::::::::

We'll start by exploring how version control can be used
to keep track of what one person did and when.
## Tracking changes

We'll start by exploring how we are usually introduced to version control to keep track of what one person did and when.
Even if you aren't collaborating with other people,
automated version control is much better than this situation:
version control may have look like this situation:

![Tracking changes for the `manuscript.docx` file.](fig/git-manual-version-system.png){alt='One same files called manuscript with modification dates as prefix and what a person did as sufix.'}

We all started by replicating the Version control system of our initial community. We all need to differentiate files from previous versions or different colleagues (supervisors or reviewers). For those reasons, we tag each version to track what one person did (e.g., with a suffix) or when (e.g., with a prefix).

Does it seem unnecessary to you to have multiple nearly identical versions of the same document? Possibly yes. But this Version control system opens the possibility of returning to a specific version in case you erased something that you think now is essential.

::::::::::::::::: discussion

### File names to track changes

Write down:

- Is there any file naming convention that is familiar to you?

!["notFinal.doc" by Jorge Cham, <https://www.phdcomics.com>](fig/phd101212s.png){alt='Comic: a PhD student sends "FINAL.doc" to their supervisor, but after several increasingly intense and frustrating rounds of comments and revisions they end up with a file named "FINAL_rev.22.comments49.corrections.10.#@$%WHYDIDCOMETOGRADSCHOOL????.doc"'}
- What was the version control system that you first used?

We've all been in this situation before: it seems unnecessary to have
multiple nearly-identical versions of the same document. Some word
- Share with us your favorite prefix or suffix!

:::::::::::::::::::::::::::::

::::::::::::::::::: instructor

We can use the comments or live participation as ice-breaker.

::::::::::::::::::::::::::::::

Some word
processors let us deal with this a little better, such as Microsoft
Word's
[Track Changes](https://support.office.com/en-us/article/Track-changes-in-Word-197ba630-0f5f-4a8e-9a77-3712475e806a),
Google Docs' [version history](https://support.google.com/docs/answer/190843?hl=en), or
LibreOffice's [Recording and Displaying Changes](https://help.libreoffice.org/Common/Recording_and_Displaying_Changes).
LibreOffice's [Recording and Displaying Changes](https://help.libreoffice.org/Common/Recording_and_Displaying_Changes). Let's illustrate how Google Docs works.

![Tracking changes for the `git-test` file.](fig/gdocs-01.png)

To use Google Docs version history click `File` > `Version history` > `See version history`. This highlights the new content added to the file in that version only.

![The most recent version of the `git-test` file is called "third version".](fig/gdocs-03.png)

We can move to any previous version tagged with two metadata values: the modification date and the name of the author.

![We can view the "first version" of the file `git-test`. We can also restore it with the `Restore this version` button.](fig/gdocs-04.png)

Google Docs’ version history tool is an automatic Version control system for single Word/Doc files that works online.

![The Turing Way project illustration by Scriberia. Used under a CC-BY 4.0 licence. DOI: <https://zenodo.org/doi/10.5281/zenodo.3332807>.](fig/ProjectHistory.jpg)

## Version control systems

Version control systems start with a base version of the document and
then record changes you make each step of the way. You can
Expand All @@ -42,26 +82,34 @@ more recent version.

Once you think of changes as separate from the document itself, you
can then think about "playing back" different sets of changes on the base document, ultimately
resulting in different versions of that document. For example, two users can make independent
sets of changes on the same document.

![](fig/versions.svg){alt='Different Versions Can be Saved'}

Unless multiple users make changes to the same section of the document - a conflict - you can
incorporate two sets of changes into the same base document.

![](fig/merge.svg){alt='Multiple Versions Can be Merged'}
resulting in different versions of that document.

A version control system is a tool that keeps track of these changes for us,
effectively creating different versions of our files. It allows us to decide
effectively creating different versions of our files.

<!--
It allows us to decide
which changes will be made to the next version (each record of these changes is
called a [commit](../learners/reference.md#commit)), and keeps useful metadata
about them. The complete history of commits for a particular project and their
metadata make up a [repository](../learners/reference.md#repository).
Repositories can be kept in sync across different computers, facilitating
collaboration among different people.
-->

::::::::::::::::: checklist

### Checklist

Key characteristics of Version control systems are:

1. Keep the entire history of a file and inspect a file throughout its lifetime.

::::::::::::::::::::::::::::::::::::::::: callout
2. Tag a particular version so you can return to them easily.

:::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::: instructor

## The Long History of Version Control Systems

Expand Down Expand Up @@ -95,33 +143,71 @@ the same files concurrently.

## Solution

- Recovering the excellent version is only possible if you created a copy
of the old version of the paper. The danger of losing good versions
often leads to the problematic workflow illustrated in the PhD Comics
cartoon at the top of this page.

- Collaborative writing with traditional word processors is cumbersome.
Either every collaborator has to work on a document sequentially
(slowing down the process of writing), or you have to send out a
version to all collaborators and manually merge their comments into
your document. The 'track changes' or 'record changes' option can
highlight changes for you and simplifies merging, but as soon as you
accept changes you will lose their history. You will then no longer
know who suggested that change, why it was suggested, or when it was
merged into the rest of the document. Even online word processors like
Google Docs or Microsoft Office Online do not fully resolve these
problems.


- Recovering the excellent version is only possible if you created a copy of the old version of the paper. The danger of losing good versions often leads to the problematic workflow illustrated in [this popular PhD Comics cartoon](https://phdcomics.com/comics/archive.php?comicid=1531).

- Collaborative writing with traditional word processors is cumbersome. Either every collaborator has to work on a document sequentially (slowing down the process of writing), or you have to send out a version to all collaborators and manually merge their comments into your document. The 'track changes' or 'record changes' option can highlight changes for you and simplifies merging, but as soon as you accept changes you will lose their history. You will then no longer know who suggested that change, why it was suggested, or when it was merged into the rest of the document. Even online word processors like Google Docs or Microsoft Office Online do not fully resolve these problems. Remember this for the collaboration episode!

:::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::::::::::::::::

## Version control and R files

For code-like files like `.R` and `.Rmd` files, we can not use Google docs. The software and strategy to track changes in a project depends on the **file type**.

- Google Docs’ version history tool is a Version control software optimized for single [non-plain text files](https://github.com/epiverse-trace/git-rstudio-basics/discussions/11) like Word/Doc files that works online.

- `Git` is the Version control software optimized for [plain text files](https://en.wikipedia.org/wiki/Plain_text) that works offline. (Read: “What Not to Put Under Version Control” at [G. Wilson et al. 2017](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510#sec014))

Plain text files can be text, code, and data. Example for each of these are Markdown files (`.md`), R files (`.R`), and `.csv` or `.tsv` files, respectively.

![Examples of non-plain and plain text files.](fig/plain-text-files.png)

::::::::::::::::: callout

We can use `Git` to track changes of these plain text files. However, for [plain text data files](https://www.r4epi.com/importing-plain-text-files.html) (like `.csv` and `.tsv`) we prefer different version control systems.

:::::::::::::::::::::::::

Plain text files like Markdown files (`.md`) and R files (`.R`) are integrated in Rmarkdown files (`.Rmd`) to generate manuscripts, websites, and R packages. These three products are outputs of **Open Science** projects, that leads to *Reproducible research* and *Sustainable software*.


![We can increase the reproducibility of our Open science projects with version control sytems like `Git`. Text and final results can be connected and executable by Data and code. From: "Ciencia reproducible: qué, por qué, cómo" <https://www.revistaecosistemas.net/index.php/ecosistemas/article/view/1178>](fig/gradient-reproducibility.png)

::::::::::::::::::: instructor

### Concept map

![Concept map for automatic version control.](fig/git-concept-03.drawio.png)

::::::::::::::::::::::::::::::

::::::::::::::::: discussion

### Exercise!

Tell us about your Open Science project and its file types!

- Briefly share about one Open Science project in which they are involved or would like to start soon (e.g. thesis, current project, or work);
- Identify the most relevant file types (`.pdf`, `.jpge`, `.csv`, `.xlsx`, `.R`, `.docx`, `.Rmd`) involved in it and classify them as non-plain or plain text files;
- Discuss which ones can use a Version control software like Git?

::::::::::::::::::::::::::::

::::::::::::::::::: instructor

This exercise can be solved in the shared document of the training.

::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::::: keypoints

- Version control is like an unlimited 'undo'.
- `Git` is a Version control software optimized for plain text files, like `.R` and `.Rmd` files.

<!--
- Version control also allows many people to work in parallel.
-->

::::::::::::::::::::::::::::::::::::::::::::::::::

Expand Down
Loading

0 comments on commit a7c165e

Please sign in to comment.