forked from UofTCoders/rcourse
-
Notifications
You must be signed in to change notification settings - Fork 0
/
lec15-git-projects.Rmd
486 lines (367 loc) · 18.2 KB
/
lec15-git-projects.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
---
title: "Reproducible science and collaboration with Git and GitHub"
author: Ahmed Hasan
---
## Lesson preamble
> ### Lesson objectives:
>
> - Understand basic bash commands
> - Understand the logic of version control
> - Practice using Git commands at the command line
> - Learn how and when to use branches
> - Learn how to effectively use Pull Requests
> - Learn how to work with forks
>
> ### Lesson outline:
>
> - Intro to Git (50 min)
> - Bash Primer (15 min)
> - Getting started with Git (5 min)
> - Version control with Git (30 min)
> - Collaborating with GitHub (50 min)
> - Creating a fork + GitHub setup (5 min)
> - Branches and pull requests (40 min)
> - Issues (5 min)
>
-----
## Intro to Git
Git is a version control system that tracks changes in files. Although it is
primarily used for software development, Git can be used to track changes in
any files, and is an indispensable tool for collaborative projects. Using Git,
we effectively create different versions of our files and track who has made
what changes. The complete history of the changes for a particular project and
their metadata make up a repository.
To sync the repository across different computers and to collaborate with
others, Git is often used via [GitHub](http://www.github.com), a web-based
service that hosts Git repositories. In the spirit of 'working open' and
ensuring scientific reproducibility, it has also become increasingly common for
scientists to upload scripts and related files to GitHub for others to use.
In this lesson, we'll be covering some basic Git commands, before moving on to
using GitHub for the upcoming group projects. All members of the group will be
given admin access to a pre-made repository.
### Command line Git
#### Setup and Installation
Git is primarily used on the command line. The implementation of the command
line that we'll be using is known as a 'bash interpreter', or 'shell'. While
bash interpreters are natively available on Mac and Linux operating systems,
Windows users may have to externally install a bash shell.
To install Git on a **Mac**, you will have to install
[Xcode](https://itunes.apple.com/ca/app/xcode/id497799835?mt=12), which can be
downloaded from the Mac App Store. Be warned: Xcode is a very large download
(approximately 6 GB)! Install Xcode and Git should be ready for use. (_Note:
most students will have already installed Xcode already for some R packages we
used earlier in the course. If you're not sure whether this is the case, run
the `git --version` command described below_)
To install Git (and Bash) on **Windows**, download [Git for
Windows](https://gitforwindows.org/) from its
[website](https://github.com/git-for-windows/git/releases/tag/v2.19.1.windows.1).
At the time of writing, 2.19.1 is the most recent version. Download the `exe`
file to get started. Git for Windows provides a program called Git Bash, which
provides a bash interpreter that can also run Git commands.
To install Git for **Linux**, use the preferred package manager for your
distribution and follow the instructions [listed
here](https://git-scm.com/download/linux).
To test whether Git has successfully been installed, open a bash interpreter,
type:
```
git --version
```
and hit Enter. If the interpreter returns a version number, Git has been
installed.
#### Bash Primer
Before we can use Git in the bash interpreter, it's worth learning a few basic
bash commands. All bash commands are run simply by typing them into the
interpreter and pressing Enter. All instances of `foldername` and `filename`
should be replaced with the name of whichever file/folder you would like to
provide as input to the operation.
```
pwd
```
`pwd`, or 'print working directory', will return the folder bash is currently
in. This is similar to the concept of the working directory in R that we
discussed earlier in the course.
```
cd foldername
```
`cd` changes the current working directory to an input folder (in place of
`foldername`). This is useful in tandem with `~`, which is shorthand for 'home
directory' in bash (usually `/Users/[yourname]`). To navigate to your Desktop,
the command would be `cd ~/Desktop`.
`cd ..` will let you move back up one folder. If your current working directory
is `/Users/[yourname]/Desktop`, for instance, `cd ..` will change it to
`/Users/[yourname]`.
```
ls
```
`ls` will return what files/folders are within the current working directory.
This is useful when you have just used `cd` to change your current directory
and need a quick reminder of what's in said directory.
```
mkdir foldername
```
`mkdir`, or 'make directory', will create a new directory with
the specified name. For instance, `mkdir thesis` will create a folder
called `thesis`, which can then be entered with `cd thesis`.
```
nano filename
```
`nano` is a text editor for Unix-like systems that opens right within the bash
window. If the filename provided doesn't refer to an existing file, `nano` will
create a new file with that name. Once you're done with `nano`, press Control
+ X to exit - `nano` will ask whether you want to save your work, which can be
answered by pressing either `y` or `n`. Alternatively, Control + O followed
by Enter will save your current work, after which Control + X will exit without
prompting.
You can use any other text editor based on your system, such as Notepad on
Windows.
#### Getting started with Git
First, we have to tell Git who we are. This is especially important
when collaborating with Git and keeping track of who did what!
Keep in mind that this only has to be done the first time we're using Git.
This is done with the `git config` command. When using Git, _all_ commands
begin with `git`, followed by more specific subcommands.
```
git config --global user.name "My Name"
git config --global user.email "[email protected]"
```
Next, we have to set a default text editor for Git.
Here, we'll use `nano`, as mentioned above:
```
git config --global core.editor "nano"
```
If you would like to set up a different text editor as your default, see
[here](https://swcarpentry.github.io/git-novice/02-setup/index.html).
Finally, the following command can be used to review Git configurations:
```
git config --list
```
#### Version control with Git
Note: *Material for the following section of the class was taught from
[Software Carpentry](www.software-carpentry.org)'s openly available Intro to
Git lesson, which is available in its entirety
[here](http://swcarpentry.github.io/git-novice/).*
Material covered in class, in order:
- [Creating a repository](https://swcarpentry.github.io/git-novice/03-create/index.html)
- [Tracking changes](https://swcarpentry.github.io/git-novice/04-changes/index.html)
- [Exploring history](https://swcarpentry.github.io/git-novice/05-history/index.html)
##### Summary: Basic Git commands
- `git init` (or `git clone`)
- `git status`
- `git log`
- `git add`
- `git commit`
- `git checkout`
#### Branching
Git commit history is a [directed acyclic
graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (or DAG), which
means that every single commit always has at least one 'parent' commit (the
previous commit(s) in the history), and any individual commit can have multiple
'children'. This history can be traced back through the 'lineage' or
'ancestry'.
##### What are branches?
Branches are simply a *named pointer* to a commit in the DAG, which
automatically *move forward* as commits are made. Divergent commits (two
commits with the same parent) could be considered "virtual" branches. Since
they are simply pointers, branches in Git are very lightweight.
![Examples of branching](image/branches.svg)
##### Why use them?
- To keep experimental work separate
- To separate trials
- To ease collaboration
- To separate bug fixes from development
We will focus more on branches in the second section of the lesson when we
learn about GitHub.
## Collaborating with GitHub
Although command line Git is very useful, GitHub allows for easy use of Git for
collaborative purposes using a primarily point-and-click interface, in addition
to providing a web-based hosting service for Git repositories (or 'repos'). If
you have not already made a GitHub account, do so now [here](www.github.com).
All groups have been provided with existing repositories, but a new repository
can be made by clicking on the `+` in the top right of the page and selecting
'New repository'. For now, however, navigate to your provided group repo.
### Creating a fork
The repos that have already been created can be thought of as 'main repos',
which will contain the 'primary' version of the repo at any given point in
time. However, instead of directly uploading and editing files right within
this main repo itself, we will begin by _forking_ the repo. When a given user
forks a repo, GitHub creates a user-specific copy of the repo and all its
files.
To fork your group's repo, navigate to the repo's page and click on the 'Fork'
button at the top right of the page. Following a brief load screen, GitHub will
redirect you to your new, forked repo.
You'll notice that on the top left of this repo page, the repo's name will be
'[your username] / [repo name]', as opposed to 'EEB313 / [repo name]'.
Furthermore, GitHub will indicate that this is a fork right underneath said
repo name ('forked from eeb313-[year]/ [repo name]').
### Branches and Pull Requests
Currently, your fork is identical to what the original repo looked like at the
moment of forking, and so GitHub will indicate that the fork is currently
'even' with its original. Any edits that are made in the fork will be unique to
the fork, leaving the original unchanged.
But how do you add these changes to the main repo? Doing so is a two step
process:
1. First, create a _branch_ on your _fork_ (not the main repo). This is yet
another copy of the fork (a copy of a copy!) that will store changes separately
from 'master', which is the default branch.
2. Once you have made our changes in this branch, submit what's
known as a _pull request_ from this edited branch to the main repo. This
allows your changes to be reviewed by the other members of your group before
they are merged into the main repo.
Although this process may seem a bit laborious, using this method (also known
as the 'GitHub flow') minimizes chances of error and ensures that all code is
reviewed by at least one other person. Understanding how and why this process
works is key to collaborative work in software development and the like, and is
used by all sorts of open source projects on GitHub ([including `dplyr`
itself](https://github.com/tidyverse/dplyr)!)
#### Creating a new branch
In a given repo, there is a 'Branch: master' button just above the list of
files. Clicking on this will bring up a dropdown menu containing a list of
branches (which currently just contains `master`) and a text field. A new
branch can be created by typing in the text field. Start by making a new branch
called `initial-edit` using said text field. GitHub will then refresh, and the
button will now say 'Branch: initial-edit' instead.
![_Pictured: the branch dropdown menu_](image/branch_dropdown.png)
The dropdown can now be used to switch back to `master`, but let's stay in
`initial-edit` for now. This is the place to make new changes -- either by
editing existing files or creating new ones. Let's create a simple Rmd file
with the 'Create new file' button on the right side of the repo page.
````
---
title: "Test R Notebook"
author: "Your Name"
output: pdf_document
---
Hello! This is my R notebook.
```{r}`r ''`
print('Hello, GitHub!')
```
````
![_Pictured: editing a file on GitHub_](image/sample_rmd.png)
Once done, scroll to the bottom of the page and add a commit message
using the text file -- something like 'initial Rmd file creation' --
and then click on 'Commit new file'.
#### Pull requests
After a commit has been made, GitHub takes you back to the main repo
page, but now with an attention-grabbing yellow prompt about these
new edits. GitHub has noticed that your fork has a new branch with
edits, and over on the right side of the prompt, GitHub presents the
option of creating a pull request.
![_Pictured: GitHub's pull request prompt_](image/yellow_prompt.png)
If you don't see the yellow prompt, head to your fork and switch to the
branch containing your edits. Underneath the green 'Clone or download'
button, there is a 'Pull request' option -- this can be used to open
a pull request instead. You'll also notice that the left side of this
bar lists how many commits ahead of `master` your branch currently is.
Following either method of opening a pull request, GitHub takes you to
the 'Pull requests' tab of the _main repo_ and prompts you to write about
your pull request. Here, you can (and should) explain the changes you've
made for your group members, so that they know what to look for and review.
Be specific and detailed to save your group members' time -- it's a good
idea to start off your pull request message with an overall summary
('adding `dplyr` code to clean dataset') followed by a point-form list of
what changes have been made, if necessary.
Once the pull request has been made, GitHub will list both your message
and your commit messages below. You also have the option of merging the
pull request yourself -- but _don't do this_! When collaborating, always
have someone else review and merge your pull request.
If all does _not_ look good, your team members can add messages below, and
tag others using the @ symbol, similarly to most social networks. If
more changes are needed before the pull request is ready to merge,
any new commits you make to the branch on your fork (`initial-edit` in
our case here) will automatically be added on to the pull request.
This way, you can incorporate any changes or fixes suggested by your
team members simply by continuing to work in your fork's new branch
until your changes are ready to merge.
Once a pull request has been merged into the main repo, the branch on your fork
isn't needed anymore. Because of this, GitHub will immediately prompt you to delete
the `initial-edit` branch as soon as the merge has been completed.
![_Pictured: GitHub's branch deletion prompt_](image/delete_branch.png)
Great work -- your edits are now on the main repo!
#### Syncing your fork
There's one issue with the above, though -- now, the `master` branch
on your fork is out of date with the current state of the main repo.
The fork has to be synced to remedy this issue.
First, head to your fork on GitHub, and click on the green 'Clone or
download' button on the right side of the page. This yields a link
to your fork. Copy this link to your clipboard.
Next, open a bash shell, and navigate over to whichever folder you would
like a copy of your repo to be saved in. Then, run:
```
git clone [repo link]
```
With the link in place of `[repo link]`. This process, known as _cloning_, will
create a new folder in your current working directory that contains the
contents of your fork. Enter this new folder with `cd` and type `git status` to
make sure the repo has been cloned properly. `git status` should output that
the branch is even with `origin/master`, indicating that it is currently the
same as the current state of your fork.
To get your fork up to date with the main repo, you next have to add a _remote_
linking to the main repo. Head to your group's repo and once again click
on 'Clone or download' to grab its link. Then, using the main repo link, run:
```
git remote add upstream [repo link]
```
Once this is done, run:
```
git remote -v
```
to get a list of existing remotes. This should return four links, two of which
are labelled `origin` and two of which are labelled `upstream`.
Next, you have to _fetch_ the new changes that are in the main repo.
```
git fetch upstream
```
Once the edits have been downloaded, merge them into your local repo:
```
git merge upstream/master
```
Your local copy is now even with the main repo! Finally,
push these changes to the GitHub version of your fork:
```
git push origin master
```
and now the GitHub version of the fork is all synced up, ready for
your next batch of edits, and eventually another pull request!
#### Keeping your fork up to date
When other group members add to the main repo through their own
pull requests, you have to make sure those edits have been incorporated
into your repo before making new changes of your own. This ensures
that there aren't any conflicts within files, wherein your edits
clash with someone else's if one of you is working with an earlier
version of the file.
To remain up to date, navigate to the local copy of your repo using
bash, and once again repeat the above steps:
```
git fetch upstream
git merge upstream/master
git push origin master
```
Note:
The `git pull` command combines two other commands, `git fetch` and `git merge`.
This will download any changes your group members may have made
and update both the local and GitHub versions of your fork accordingly.
Before starting any edits of your own, it's usually a good idea to start off
by checking to see whether anything's been added to the main repo and, if needed,
then syncing your fork to account for that.
#### Issues
Finally, each repo on GitHub also has an Issues tab at the top of the page.
Here, you and your group can create posts regarding the content of the
repo that highlight issues with code or serve as to-do lists to
manage outstanding tasks with.
Although issues aren't needed for any of the steps we discussed above,
it can be useful to create a roadmap of your project with them and
assign group members to specific tasks if need be.
### To sum up
The 'GitHub flow':
1. Make sure your fork is up to date
2. Create a new branch on your fork
3. Make your edits (with informative commit messages along the way)
4. Submit a pull request
5. Have your group members review your pull request
6. If your group requests further edits prior to merging, make more commits in the new branch
7. Once your edits have been merged, delete your new edits-containing branch
8. Sync your fork to be up to date with the new changes
Additional resources:
- [A visual demonstration](https://guides.github.com/introduction/flow/) of the GitHub flow.
- A useful [Git command cheat sheet](https://services.github.com/on-demand/downloads/github-git-cheat-sheet.pdf).
- Guide to a [good Git Workflow](https://www.atlassian.com/git/tutorials/comparing-workflows)